27
Big Data Technologies and the Evolution of NoSQL Dwight Merriman [email protected]

Big Data Technologies and the Evolution of NoSQL Dwight Merriman [email protected]

Embed Size (px)

Citation preview

Big Data Technologies and the Evolution of NoSQL

Dwight [email protected]

Why #1 : the Imperitive to Scale

(horizontally)

http://www.globalnerdy.com/2007/09/07/multicore-musings/

cloud, virtualizationcloud, virtualization

power/coolingpower/cooling

commoditycommodity

data explosiondata explosion

Computers are faster,but scaling is harder

UI

compute

data processing / ETL

caching

database / datastore

network

not just the database

UI √

compute

data processing / ETL

caching √

database / datastore

network √

what’s hard?

joins?

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

]

}

`

transactions?

NoSQL = Non-relational next generation operation data

stores and databases

no joins +light transactional semantics = horizontally scalable architectures

Why #2 : dealing with “weird” data

legal

CMS

customerpreferences, behavior,

relationships

organizational knowledge

team (human) process information

SFA

EMR

no joins +light transactional semantics ->

new data models

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

]

}db.posts.find( { author : “joe” } ).sort({when:1})

db.posts.find( {tags:”news”,votes:{$gt:100} )

db.posts.find( { “comments.by” : “gora” } )

db.posts.ensureIndex({“comments.by”:1})

Why #3 : (software development)

Agility

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

]

}

db.posts.find( { author : “joe” } ).sort({when:1})

db.posts.find( { “comments.by” : “gora” } )

Future Landscape

the db space 2000 - 2010

NoSQL = Non-relational next generation operation data stores and databases

Benefits : - scale- leverage the vast swathes of semi-structured data- agility, nimbleness