49
23 June 2017 © COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Becoming a Document Modeling Guru Mike Bowers

Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World [email protected]

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

23 June 2017copy COPYRIGHT MARKLOGIC CORPORATION ALL RIGHTS RESERVED

Becoming a Document Modeling Guru

Mike Bowers

by Michael Bowers 2017-05-01

v 54

Becoming a Data Modeling Guru

2

2017 MarkLogic World

mikecssDesignPatternscom

Abstractbull We know how to create great relational database models

but how do we create documentgraph models

bull How do we optimize a documentgraph model to work great in and out of MarkLogic

bull Do we have to unlearn everything relational

bull Do we need joins

bull Do we need schemas

bull What do we normalize denormalize orthogonalize and generalize

This session will liberate you from flat tables and limited relationships Youll learn why it is most natural to represent business entities as hierarchical documents why graphs are the best way to relate any business entity to any other business entity and

how MarkLogics unique indexes and query APIs make it easy to query within and join across hierarchical documents

3

Why Document GraphRelational modeling was revolutionary fifty years ago

mdash We are in a new revolution mdash

Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning

Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise

semantic meaning4

About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years

bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns

bull mikecssDesignPatternscom

5

Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running

190+ websites and applications with billions of page views annually

bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 2: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

by Michael Bowers 2017-05-01

v 54

Becoming a Data Modeling Guru

2

2017 MarkLogic World

mikecssDesignPatternscom

Abstractbull We know how to create great relational database models

but how do we create documentgraph models

bull How do we optimize a documentgraph model to work great in and out of MarkLogic

bull Do we have to unlearn everything relational

bull Do we need joins

bull Do we need schemas

bull What do we normalize denormalize orthogonalize and generalize

This session will liberate you from flat tables and limited relationships Youll learn why it is most natural to represent business entities as hierarchical documents why graphs are the best way to relate any business entity to any other business entity and

how MarkLogics unique indexes and query APIs make it easy to query within and join across hierarchical documents

3

Why Document GraphRelational modeling was revolutionary fifty years ago

mdash We are in a new revolution mdash

Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning

Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise

semantic meaning4

About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years

bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns

bull mikecssDesignPatternscom

5

Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running

190+ websites and applications with billions of page views annually

bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 3: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Abstractbull We know how to create great relational database models

but how do we create documentgraph models

bull How do we optimize a documentgraph model to work great in and out of MarkLogic

bull Do we have to unlearn everything relational

bull Do we need joins

bull Do we need schemas

bull What do we normalize denormalize orthogonalize and generalize

This session will liberate you from flat tables and limited relationships Youll learn why it is most natural to represent business entities as hierarchical documents why graphs are the best way to relate any business entity to any other business entity and

how MarkLogics unique indexes and query APIs make it easy to query within and join across hierarchical documents

3

Why Document GraphRelational modeling was revolutionary fifty years ago

mdash We are in a new revolution mdash

Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning

Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise

semantic meaning4

About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years

bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns

bull mikecssDesignPatternscom

5

Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running

190+ websites and applications with billions of page views annually

bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 4: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Why Document GraphRelational modeling was revolutionary fifty years ago

mdash We are in a new revolution mdash

Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning

Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise

semantic meaning4

About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years

bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns

bull mikecssDesignPatternscom

5

Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running

190+ websites and applications with billions of page views annually

bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 5: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years

bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns

bull mikecssDesignPatternscom

5

Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running

190+ websites and applications with billions of page views annually

bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 6: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running

190+ websites and applications with billions of page views annually

bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 7: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 8: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 9: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Six Data Paradigms

9

DimensionalKimball Data Warehousing

Wide ColumnFixed Dense Tables wFixed Queries No Joins

DocumentSparse Variable Data Structures

RelationalFixed Dense Tables with Flexible

Queries amp Joins

GraphUnlimited Relationships

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 10: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

6 MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle Exalytics3 SAP HANA

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

2 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

6 MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

5 AWS DynamoDB6 MarkLogic8 MongoDB

5 AWS DynamoDB10 Redis

KeyValueSimple

Key

9 DataStax Cassandra

Wide-ColumnComplex

Key

GraphDimensional Relational DocumentWide Column Key Value

Top Ten Databases Overall

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 11: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Do we combine multiple models

11

DimensionalKimball Data Warehousing

RelationalFixed Dense Tables with Flexible

Queries amp Joins

Wide ColumnFixed Dense Tables wFixed Queries No Joins

key value 1

hash [ key 1 value 1 key 2 value 2 ]

list value 1 list value 2

[ set value 1 set value 2 ]

Key ValuePredefined Data Structures

DocumentSparse Variable Data Structures

GraphUnlimited Relationships

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 12: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Low

Lat

ency

Ope

ratio

nal

Velo

city

High

Ban

dwid

th A

naly

tical

Volu

me

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

sPB

TB

G

B

5

00 tp

s 1

000

tps

10K

tps

1

00K

tps

Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure

MarkLogic

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

GraphRDF

MarkLogic

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSONDocument

MarkLogic

KeyValueSimple

Key

Wide-ColumnComplex

Key

1 Multi-model NoSQL

Database

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Live AnalyticsHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

Data WarehouseHospital KeyHospital Attributes

Hospital Dimension

Surgeon KeySurgeon Attributes

Surgeon DimensionOperation KeyOperation Attributes

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes

Drug Dimension

MarkLogic

Operational

GraphDimensional Relational DocumentWide Column Key Value

Multi-Model Enterprise NoSQL Databases

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 13: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Power of Combining XML Document and RDF Graph

13

A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

+ DataNarrative + Relationships= Contextual Information

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

(Semantic amp Structural)= Meaningful Knowledge

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

related topic

author of

published article in journal

publisher of

problemproblem

T

T

purpose

TTTsolution

solution

problem

solutionproblem T

T

Tproblem

T

solution

problem

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 14: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph

Didnt EF Codd prove hierarchical and graph databases are inferior to relational

1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy

2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities

3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL

14

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 15: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

RDF Graph and XML enable us to turn content

into meaningful knowledge

What can RDF Graph and JSON do for data

15

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 16: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity

ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity

bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries

16

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documents invoices etc

Product manual Vendor order forms invoices etc

Customer liked product

Vendor received RMA on product

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 17: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

RDF = Standard Meaningful Graphs

17

Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to

understand by leveraging existing relationship ontologies

TIP Search for ontologies at Linked Open Vocabularies (LOV)

bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT

bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 18: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

New Data Structures

for Variety

and Variability

18

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 19: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

New Data Structures for Variety and Variability

19

Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)

Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type

Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure

Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document

Each table contains a sparse number of rows and requires each row to have the same fixed structure

Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections

Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data

Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data

Each schema must have fixed number of predefined tables and constraints before data can be loaded

Each database may contain documents without first defining structures document types relationships collections etc

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 20: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Why is relational fixed and dense

20

Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery

Operation ID Surgeon ID Hospital ID13 1 7

Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data

Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL

Drug ID Drug Name100 Minocycline101 Minomycin

Fixed field types

Fixed table structure

Fixed table relationships

Each row has same structure

Fixed set of tables

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 21: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data

21

_id 1_type operationcollections [operation transplants]operation

hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [

drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg

]relations

values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1

]

Variable Data Types

Variable Document Types

Variable Relationships

Variable and Sparse Collections

Variable Document Structures

Sparse PropertiesSparse Denormalized

Properties

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 22: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON1 No document type2 Simple easy and fast to parse No comments

No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats

strings booleans nulls

XML1 Document type 2 Namespaces for nested objects Comments Attributes

for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all

number types dates durations strings booleans null etc

heading JSON is best for nested object DATAparagraphs[

paragraph [type text value Everything is an object ]

paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]

JSON is ideal for data structures

Developers work with data with maximal reliance

on predictable structures

XML is ideal for content

Developers work with tagged content with minimal reliance

on variable structure

ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt

ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt

ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt

ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be

freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt

Choosing Between XML and JSON

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 23: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 24: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Simple Customer Order Relational Model

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 25: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

What would a real Customer Data Model look like

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 26: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Customer Model

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 27: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary

Person Emails

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

WebsitesPerson IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

The person ERD is 13 tables because each multivalued

property requires a separate table in relational

All of this should be represented as one JSON

document because it is one business entity

Person ERD

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 28: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 29: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Phone IDPerson IDPhone PurposePhone Number

Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary

Person EmailsPerson IDAddress IDAddress PurposeIs Primary

Persons Addresses

Website IDURL

Websites

Person IDWebsite IDWebsite PurposeIs Primary

Persons Websites

Company IDWebsite ID

Companies Websites

Company IDAddress ID

Companies Addresses

Company ID

Companies

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Bank Payment Methods

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Person Names

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Visa Payment Methods

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Persons Companies

Bank Payment IDPerson IDIs Primary

Persons Bank Payment Methods

Visa IDPerson IDIs Primary

Persons Visa Payment Methods

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Order

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order ItemsBank Payment IDPerson IDIs Primary

Long table Name

Address IDStreetLocalityRegionPostal CodeCountry

Short

Phone IDPerson IDPhone PurposePhone Number

Really

Person IDAddress IDAddress PurposeIs Primary

Flabbergastic

Website IDURL

For all

Person IDWebsite IDWebsite PurposeIs Primary

Details of deati

Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State

Lookup Lists

Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name

Wonderfule

Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status

Testing

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank

Fact

Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date

Order Items

Bank Payment IDPerson IDIs Primary

Long table Name

Person IDWebsite IDWebsite PurposeIs Primary

of deati

Website IDURL

For all

More Realistic Customer Order Model

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 30: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Same Realistic Customer Order Model

bull A JSON document is a business entity

bull Meaningful graph relationships connect business entities

30

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

JSON

Vendors

Product that was ordered

Vendor who sold the product

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 31: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Same Realistic Customer Order Model

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 32: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Relational Modeling Normalize1 Normalizebull Make each attribute

single valued ndash Create one column per

attribute

ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins

bull Group attributes into tables ensuring each table has one coherent context

bull Assign one primary key to each table

bull Eliminate duplicate attributes across tables

32

One-to-one Many-to-many Reference TablesOne-to-many

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 33: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Create one business entity or transaction per JSON document

bull JSON properties can be multi-valued (ie arrays) which means we can embed structures

bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index

to denormalize data and Optic API can query it

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 34: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into

bull When triples link documents the Optic API can query triple data as if it were part of any linked document

bull This is denormalizing without denormalizing

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 35: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables

that stand independent of all contexts

bull Create transaction tables to join together business tables

bull Create reference tables to standardize entity states and attribute characteristics

bull This maximizes data reuse by allowing tables to be combined with other tables to create any context

35

Business Tables Reference TablesTransaction Tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 36: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

DocGraph Modeling Orthogonalize

bull Everything about each entity should be in the entity

bull A business entity should exactly match how users think of it

bull A transaction should contain everything about the transaction

bull Multiple lookup tables may be combined in one doc

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 37: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Relational Modeling Generalize3 Generalizebull Make tables more

general in purpose so they can be reused in multiple contexts and are resilient to change

bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc

bull Do not over generalize in relational because it hides the purpose of the model

37

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 38: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing

bull This example shows a person being subclassed as an employee customer and customer rep

bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model

bull See subclassing below

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 39: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Relational Modeling Tune4 Tunebull Tune the model to

meet the performance requirements of the application

bull Optimize sparse data out of a table and put it into one-to-one related tables

bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance

39

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 40: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

bull These examples are tuned for MarkLogic

bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy

bull You manually create inequality indexes based on property name or hierarchical path

personId 11

schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]

triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111

triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free

personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]

personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]

personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]

personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]

personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]

customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111

companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]

company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 41: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 42: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Simplicity

Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a

relational model

bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 43: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Performance

One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables

to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs

bull MarkLogic indexes are in RAM so getting a doc is 1 IO

bull MongoDB indexes are B-tree indexes and may require 4 IOs

bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 44: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Multi-Valued and Multi-Typed Properties

JSON documents can containmulti-valued and multi-typed properties

JSON databases can query multi-valued and multi-typed properties

using MarkLogics new Optic API

and Templated Data Extraction

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 45: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Document PROs Multi-Value Properties

Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships

to be captured in one JSON document

personAddresses [

address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States

]

Address IDStreetLocalityRegionPostal CodeCountry

Addresses

Person IDAddress IDAddress PurposeIs Primary

Persons Addresses

Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank

Persons

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 46: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Document PROs Multi-Type Arrays

Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in

relational databases

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing

paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT

]

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 47: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person

personName Mike Bowers

personBirthDate 1981-01-01

personGender male

personEthnicity caucasian

personShippingPreference free

personPhones [

phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]

personAddresses [

address

addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States ]

personEmails [

email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]

personWebsites [

website websitePurpose [ work ] websiteUrl httpwwwmikecom ]

personPaymentMethods [

paymentMethod paymentMethodType Visa paymentMethodStatus Verified

creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11

nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]

customer

customerStatus active

customerJoinDate 2001-01-01

customerReviewerRank 11111

orderId 1

order

orderNumber 111-11-111

orderDate 2001-01-01T090000Z

orderStatus Shipped

customer

customerId 11

customerName Mike Bowers

orderShipment

shipper companyIdFk 777 companyName FedEx

shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes

orderShippingAddress

addressStreet [ 99 Smith Dr ]

addressLocality Clinton addressRegion UT

addressPostalCode 84015 addressCountry United States

orderedProducts [

product

productIdFk 1111

productName Oreo Thins Sandwich Cookies

productOrderQty 3 productUnitPrice 350 productCondition New

productWeightOz 11

productOrderStatus Shipped

productShippedDate 2001-01-01+0101

seller sellerId 555 sellerName Nabisco

supplier supplierId 555 supplierName Nabisco

product

productIdFk 2222

productName Rice Dream Rice Drink - Vanilla 64 Fl Oz

productOrderQty 1 productUnitPrice 2 50 productCondition New

productId 1111

product

productCode oreo-111

productStatus active

productName Oreo Thins Sandwich Cookies

productDescription YUMMY

productCategories [ cookies chocolate cookies desserts snacks

productTags [ cookie chocolate snack treat ]

productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111

productAvailabilityDate 2001-01-01

productListPrice 450

productDiscountPrice 350

productStandardCost 250

productInventoryReorderLevel 100

productInventoryTargetLevel 1000

productVendor vendorId 555 companyName Nabisco

productSuppliers [

productSupplier supplierId 555 companyName Nabisco

standardSupplierProductPrice 250 currentSupplierProductPrice 2

supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu

productMeasures [

productMeasure

productMeasurePurpose productPackage productWeight 101

productWidth 45 productHeight 17 productLength 67

productMeasure

productMeasurePurpose shippingPackage productWeight 1

productWidth 5 productHeight 2 productLength 8

d tW b it [

companyId 555

company

companyName Nabisco

companyStatus active

companyPhones [

phone phonePurpose sales

phone phonePurpose support

phone phonePurpose shipping

companyAddresses [

address

addressPurpose [ shipping

addressLocality East Hanove

addressPostalCode 07936

companyEmails [

email emailPurpose sales

companyWebsites [

website websitePurpose home

companyPaymentMethods [

paymentMethod

paymentMethodType Che

paymentMethodAccountNumber 555

paymentMethodVerified true

supplier supplierJoinDate 2005-05

seller sellerJoinDate 2005-05

orderLookupId 111111111orderStatus [NewInvoicedShippedClosed

]productOrderStatus [

None AllocatedInvoiced Shipped On Order No Stock

]

Document PROs Everything is DataEverything is data in JSON

bull Structurebull Keysbull Valuesbull Arrays

Because structure is databull Structure can be changed by updating data

mdash just write a querybull Structure can be queried

bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 48: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

personId 11

person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222

phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true

]

Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz

Page 49: Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World mike@cssDesignPatterns.com

Document PROs Meaningful Data Modeling

49

A Relational Model of Data for Large Shared Data Banks

E F CODD

IBM Research Laboratory San Jose California

Information Retrieval Volume 13 Number 6

June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422

1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof

12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip

121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one

122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip

123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths

T

PO L L

EP

T

TT

TR R R R R

T Topic

P Person

L Location

P Publication

R Reference

O Organization

E Event

geo-located in geo-located in

printed on

author of

published article in journal

publisher of

problem

problemT

T

purpose

T

T

Tsolution

solution

problem

solutionT

T

Tproblem

T

solution

problem

Customer Order

JSON

Customers

JSON

Orders

JSON

Products

XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less

Drugs 400 mg

Minicillan Drug USA 150 mg

Documentation

JSON

Vendors

Product that was ordered

Vendor who sold the product

Shipping instructions etc

Customer invoices annual reports etc

Customer order documentsProduct manual

Customer liked product

Vendor received RMA on product

Inside and Out

  • Becoming a Document Modeling Guru
  • Becoming a Data Modeling Guru
  • Abstract
  • Why Document Graph
  • About the Author
  • Church of Jesus Christ of Latter-day Saints
  • Slide Number 7
  • Slide Number 8
  • Six Data Paradigms
  • Top Ten Databases Overall
  • Do we combine multiple models
  • Multi-Model Enterprise NoSQL Databases
  • Slide Number 13
  • WAIT A MINUTE
  • RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
  • Documents + RDF Graphs = NextGen Relational
  • RDF = Standard Meaningful Graphs
  • New Data Structures for Variety and Variability
  • New Data Structures for Variety and Variability
  • Why is relational fixed and dense
  • How does NoSQL support variety and variability
  • Choosing Between XML and JSON
  • Slide Number 23
  • Simple Customer Order Relational Model
  • What would a real Customer Data Model look like
  • Customer Model
  • Person ERD
  • One Person JSON Doc = 13 Tables
  • Slide Number 29
  • Same Realistic Customer Order Model
  • Slide Number 31
  • Relational Modeling Normalize
  • DocGraph Modeling Normalize
  • DocGraph Modeling Denormalize
  • Relational Modeling Orthogonalize
  • Slide Number 36
  • Relational Modeling Generalize
  • DocGraph Modeling Generalize
  • Relational Modeling Tune
  • DocGraph Modeling Tune
  • Slide Number 41
  • Slide Number 42
  • Slide Number 43
  • Slide Number 44
  • Slide Number 45
  • Slide Number 46
  • Slide Number 47
  • Document PROs Simple Easy Data Types
  • Slide Number 49

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan

Drugs R Us

200

mg

Maxicillan

Canada4Less Drugs

400

mg

Minicillan

Drug USA

150

mg

Hospital Name

John Hopkins

Operation Number

13

Operation Type

Heart Transplant

Surgeon Name

Dorothy Oz