35
Enterprise Architect, MongoDB Buzz Moschetti [email protected] #ConferenceHashTag Creating a Single View Part 2: Data Design & Loading Strategies

Creating a Single View: Data Design and Loading Strategies

  • Upload
    mongodb

  • View
    700

  • Download
    4

Embed Size (px)

DESCRIPTION

Learn how to design a single view application and load your data into the application.

Citation preview

Page 1: Creating a Single View: Data Design and Loading Strategies

Enterprise Architect, MongoDB

Buzz [email protected]

#ConferenceHashTag

Creating a Single View Part 2:Data Design & Loading Strategies

Page 2: Creating a Single View: Data Design and Loading Strategies

Who Is Talking To You?

• Yes, I use “Buzz” on my business cards

• Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that

• Over 27 years of designing and building systems• Big and small• Super-specialized to broadly useful in any vertical• “Traditional” to completely disruptive• Advocate of language leverage and strong factoring• Inventor of perl DBI/DBD

• Still programming – using emacs, of course

Page 3: Creating a Single View: Data Design and Loading Strategies

What Is He Going To Talk About?

Historic Challenges

New Strategy for Success

Technical examples and tips

Overview &Data Analysis

Data Design &Loading

Strategies

Securing YourDeployment

çΩ

Creating A Single View

Part1

Part2

Part3

Page 4: Creating a Single View: Data Design and Loading Strategies

Historic Challenges

Page 5: Creating a Single View: Data Design and Loading Strategies

It’s 2014: Why is this still hard to do?

• Business / Technical / Information Challenges

• Missteps in evolution of data transfer technology

A X

Page 6: Creating a Single View: Data Design and Loading Strategies

We wish this “just worked”

A

Query objects from A with great performance

Query objects from B with great performance

X

Query objects from merged A and B with great performance

B

Page 7: Creating a Single View: Data Design and Loading Strategies

…but Beware The Blue Arrow!

A X

• Extracting many tables into many files• Some tables require more than one file to capture

representation• Encoding/formatting clever tricks• Reconciliation• Different extracts for different consumers• Different extracts for different versions of data to same

consumer

Page 8: Creating a Single View: Data Design and Loading Strategies

Loss of fidelity exposedclass Product {

String productName;

List<Features> ff;

Date introDate;List<Date>

versDates;int[]

unitBundles;//…

}widget1,,3,,good texture,retains value,,,20142304,102.3,201401widget2,XS,6,,,,not fragile,,,20132304,73,87653widget3,XT,,,4,,dense,shiny,mysterious,,,19990304,73,87653,,widget4,,,3,4,,,,,,20040101,,999999,,

AORM

Page 9: Creating a Single View: Data Design and Loading Strategies

What happened to XML?

class Product {String

productName;List<Features>

ff;Date introDate;List<Date>

versDates;int[]

unitBundles;//…

}

<product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun…

çΩ

Page 10: Creating a Single View: Data Design and Loading Strategies

XML: Created More Issues Than Solved

<product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun…

• No native handling of arrays

• Attribute vs. nested tag rules/conventions widely variable

• Generic parsing (DOM) yields a tree of Nodes of Strings – not very friendly

• SAX is fast but too low level

Page 11: Creating a Single View: Data Design and Loading Strategies

… and it eventually became this

<p name=“widget1” ftxt1=“good texture” ftyp1=“A” idt=“20140203” …<p name=“widget2” ftxt1=“not fragile” ftyp1=“A” idt=“20110117” …<p name=“widget3” ftxt1=“dense” idt=“20140203” …<p name=“widget4” idt=“20140203” versD=“20130403,20130104,20100605” …

• Short, cryptic, conflated tag names

• Everything is a string attribute

• Mix of flattened arrays and delimited strings

• Irony: org.xml.sax.Attributes easier to deal with than rest of DOM

Page 12: Creating a Single View: Data Design and Loading Strategies

Schema Change Challenges:Multiplied & Concentrated!

X

Alter table(s)split() more data

AAlter table(s)Extract more dataLOE = x1

Alter table(s)split() more dataAlter table(s)split() more data

BAlter table(s)Extract more dataLOE = x2

CAlter table(s)Extract more dataLOE = x3

where f() is nonlinear wrt n

Page 13: Creating a Single View: Data Design and Loading Strategies

SLAs & Security: Tough to Combine

A

B

User 1 entitled to see XUser 2 entitled to see Y

User 1 entitled to see ZUser 2 entitled to see V

X

Entitlements managed per-system/per-application here….

…are lost in the low-fidelity transfer of data….

…and have to be reconstituted here…somehow…

Page 14: Creating a Single View: Data Design and Loading Strategies

Solving The Problem with mongoDB

Page 15: Creating a Single View: Data Design and Loading Strategies

What We Are Building Today

Page 16: Creating a Single View: Data Design and Loading Strategies

Overall Strategy For Success

• Let the source systems entities drive the data design, not the physical database

• Capture data in full fidelity

• Perform cross-ref and additional logic at the single point of view, not in transit

Page 17: Creating a Single View: Data Design and Loading Strategies

Don’t forget the power of the API

class Product {String

productName;List<Features> ff;Date introDate;List<Date>

versDates;int[] unitBundles;//…

}

If you can, avoid files altogether!

Haskell

çΩ

Page 18: Creating a Single View: Data Design and Loading Strategies

But if you are creating files: emit JSON

class Product {String

productName;List<Features> ff;Date introDate;List<Date>

versDates;int[] unitBundles;//…

}

{ “name”: “widget1”, “features”: [

{ “text”: “good texture”,

“type”: “A” }],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],

“unitBundles”: [1,3,7,9]// …

}

çΩ

Page 19: Creating a Single View: Data Design and Loading Strategies

Let The Feeding System Express itself

A

B

C

{ “name”: “widget1”, “features”: [

{ “text”: “good texture”, “type”: “A” }]

}

{ “myColors”: [“red”,”blue”], “myFloats”: [ 3.14159, 2.71828 ], “nest”: { “as”: { “deep”: true }}}}

{ “myBlob”: { “$binary”: “aGVsbG8K”}, “myDate”: { “$date”: “20130405” }}

Page 20: Creating a Single View: Data Design and Loading Strategies

What if you forgot something?

{ “name”: “widget1”, “features”: [

{ “text”: “good texture”,

“type”: “A” }],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],

“versMinorNum”: [1,3,7,9]// …

}

{ “name”: “widget1”, “features”: [

{ “text”: “good texture”,

“type”: “A” }],

“coverage”: [ “NY”, “NJ” ],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],

“versMinorNum”: [1,3,7,9]// …

}

çΩ

Page 21: Creating a Single View: Data Design and Loading Strategies

The Joy (and value) of mongoDB

AAlter table(s)Extract more dataLOE = .25x1

BAlter table(s)Extract more dataLOE = .25x2

CAlter table(s)Extract more dataLOE = .25x3

Page 22: Creating a Single View: Data Design and Loading Strategies

Helpful Hints

Page 23: Creating a Single View: Data Design and Loading Strategies

Helpful Hint: Use the APIscurs.execute("select A.did, A.fullname, B.number from contact A left outer join phones B on A.did = B.did order by A.did")

for q in curs.fetchall(): if q[0] != lastDID: if lastDID != None: coll.insert(contact) contact = { "did": q[0], "name": q[1]} lastDID = q[0]

if q[2] is not None: if 'phones' not in contact: contact['phones'] = [] contact['phones'].append({"number”:q[2]})

if lastDID != None: coll.insert(contact)

{ "did" : ”D159308", "phones" : [ {"number”: "1-666-444-3333”}, {"number”: "1-999-444-3333”}, {"number”: "1-999-444-9999”} ], "name" : ”Buzz"}

çΩ

Page 24: Creating a Single View: Data Design and Loading Strategies

Helpful Hint: Declare Types

Use mongoDB conventions for dates and binary data:{“dateA”: {“$date”:“2014-05-16T09:42:57.112-0000”}}{“dateB”: {“$date”:1400617865438}}{“someBlob”: { "$binary" : "YmxhIGJsYSBibGE=", "$type" : "00" }

Page 25: Creating a Single View: Data Design and Loading Strategies

Helpful Hint: Keep the file flexibleUse CR-delimited JSON:

{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}

…instead of a giant array:

records = [ { “name”: “buzz”, “locale”: “NY”}, { “name”: “steve”, “locale”: “UK”}, { “name”: “john”, “locale”: “NY”},]

Page 26: Creating a Single View: Data Design and Loading Strategies

Helpful Hint: A quick sidebar on jq$ cat myData

{ "name": "dave", “type”: “mobile”, "phones": [ { "type": "mobile", "number": "2123455634", "dnc": false }, { "type": "mobile", "number": "6173455634" }, { "type": "land", "number": "2023455634" } ] }

{ "name": "bob", “type”: “WFH”, "phones": [ { "type": ”land", "number": "70812342342", "dnc": false }, { "type": "land", "number": "7083455634" } ] }

(another 99,998 rows)

Page 27: Creating a Single View: Data Design and Loading Strategies

Helpful Hint: jq is JSON awk/sed/grep$ jq -c '.phones[] | select(.dnc == false and .type == “mobile” )' myData

{"dnc":false,"number":"2123455634","type":"mobile"}

{"dnc":false,"number":"70812342342","type":"mobile"}

$ jq [expression above] | wc –l

32433

$ gzip –c –d myData.gz | jq [expression above] | wc –l

32433

http://stedolan.github.io/jq/

Page 28: Creating a Single View: Data Design and Loading Strategies

Helpful Hint: Don’t be afraid of metadata

Use a version number in each document:{ “v”: 1, “name”: “buzz”, “locale”: “NY”}{ “v”: 1, “name”: “steve”, “locale”: “UK”}{ “v”: 2, “name”: “john”, “region”: “NY”}

…or get fancier and use a header record:{ “vers”: 1, “creator”: “ID”, “createDate”: …}{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}

Page 29: Creating a Single View: Data Design and Loading Strategies

Helpful Hints: Use batch ID

{ “vers”: 1, “batchID”: “B213W”, “createDate”:…}{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}

Page 30: Creating a Single View: Data Design and Loading Strategies

Now that we have the data…

You’re well on your way to a single view consolidation…but first:

–Data Work• Cross-reference important keys• Potential scrubbing/cleansing

– Software Stack Work

Page 31: Creating a Single View: Data Design and Loading Strategies

You’ve Built a Great Data Asset; leverage it!

Page 32: Creating a Single View: Data Design and Loading Strategies

DON’T Build This!

Giant Glom

OfGUI-biased

code

http://yourcompany/yourapp

Page 33: Creating a Single View: Data Design and Loading Strategies

Build THIS!http://yourcompany/yourapp

Data Access Layer

Object Constructon Layer

Basic Functional Layer

Portal Functional Layer

GUI adapter Layer

Web Service Layer

Other Regular Performance Applications

Higher Performance Applications

SpecialGeneric Applications

Page 34: Creating a Single View: Data Design and Loading Strategies

What Is Happening Next?

Access Control

Data Protection

Auditing

Overview &Data Analysis

Data Design &Loading

Strategies

çΩ

Creating A Single View

Part1

Part2

Securing Your Deployment

Part3

Page 35: Creating a Single View: Data Design and Loading Strategies

Enterprise Architect, MongoDB

Buzz [email protected]

#ConferenceHashTag

Q&A