36
MongoDB Inc. Proprietary and Confidential Big Data for One Big Family VP, Community, MongoDB Matt Asay

Big Data for One Big Family

  • Upload
    mjasay

  • View
    521

  • Download
    2

Embed Size (px)

DESCRIPTION

Presentation by Matt Asay (MongoDB) at the FamilySearch Developer Conference (2014), talking about how big data applies to family history.

Citation preview

Page 1: Big Data for One Big Family

MongoDB Inc. Proprietary and Confidential

Big Data for One Big Family

VP, Community, MongoDB Matt Asay

Page 2: Big Data for One Big Family

2

What Genealogy Was: Neat and Tidy Data

Page 3: Big Data for One Big Family

3

Genealogy = Family Stories

Page 4: Big Data for One Big Family

4

Stories Aren’t Told in Spreadsheets

Page 5: Big Data for One Big Family

5

They’re Increasingly Told Like This

Page 6: Big Data for One Big Family

6

Modern, “Big” Data Is Messy

Page 7: Big Data for One Big Family

7

Data Now Looks Like This

Page 8: Big Data for One Big Family

8

It Looks Like People

Page 9: Big Data for One Big Family

The Big Data Unknown

Page 10: Big Data for One Big Family

10

Who’s Embracing Big Data?

Source: Gartner

Page 11: Big Data for One Big Family

11

Top Big Data Challenges?

Translation? Most struggle to know what Big Data is, how to manage it and who can manage it

Source: Gartner

Page 12: Big Data for One Big Family

12

•  More than 90% of today’s data was created in the last 2 years

•  Moore’s Law for data: Doubles at regular intervals

Big Data: Volume Matters

Page 13: Big Data for One Big Family

13

Big(ger) Is the New Normal

Page 14: Big Data for One Big Family

14

Volume Is Not Really the Problem

“Of Gartner's "3Vs" of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and the greatest opportunity.”

- Forrester, 2014

* From Big Data Executive Summary of 50+ execs from F100, gov orgs

What are the primary data issues driving you to consider Big Data?*

Data Variety (68%)

Data Volume (15%)

Other Data (17%)

Diverse, streaming or new data types

Greater than 100TB

Less than 100TB

Page 15: Big Data for One Big Family

15

Compounding the Confusion

Page 16: Big Data for One Big Family

16

We Hire for Machines but…

Source: Kdnuggets 2014

Page 17: Big Data for One Big Family

17

Time to Rethink the Solution

Page 18: Big Data for One Big Family

18

NoSQL Born for Unstructured Data

18

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Log

data

Free

-form

text

Web

or m

obile

co

nten

t

Soc

ial m

edia

dat

a

Geo

spat

ial d

ata

Tran

sact

ions

Mob

ile d

evic

e da

ta

Web

ses

sion

s or

ca

chin

g da

ta

Sen

sor d

ata

Em

ail/d

ocum

ents

Mac

hine

dat

a

Imag

es

Vide

o

Aud

io

NoSQL Data Types (multiples allowed)

Source: Gartner, 2014

Page 19: Big Data for One Big Family

Innovation As Iteration

Page 20: Big Data for One Big Family

“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

Page 21: Big Data for One Big Family

21

Back in 1970…Cars Were Great!

Page 22: Big Data for One Big Family

22

So Were Computers!

Page 23: Big Data for One Big Family

23

Including the Relational Database

Page 24: Big Data for One Big Family

24

Lots of Great Innovations Since 1970

Page 25: Big Data for One Big Family

25

Legacy Data Infrastructure Makes Development Hard

Relational Database

Object Relational Mapping Application

Code XML Config DB Schema

Page 26: Big Data for One Big Family

26

And Even Harder To Iterate

New Table

New Table

New Column

Name Pet Phone Email

New Column

3 months later…

Page 27: Big Data for One Big Family

27

Scale and Flexibility Drive Choices

27

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

Scalability Schema flexibility Ease of development

Cost Availability of cloud deployment options

What motivated you to use a NoSQL database over traditional alternatives? (multiples allowed)

Source: Gartner, 2014

Page 28: Big Data for One Big Family

28

RDBMS

NoSQL Drives Agility

MongoDB

{ _id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin", department : "Marketing",

title : "Product Manager, Web", report_up: "Neray, Graham",

pay_band: “C", benefits : [

{ type : "Health", plan : "PPO Plus" },

{ type : "Dental", plan : "Standard" }

] }

Page 29: Big Data for One Big Family

29

Optimize for (Developer) Iteration

1985 2013

Infrastructure Cost

Engineer Cost

Page 30: Big Data for One Big Family

30

So…Use Open Source

Page 31: Big Data for One Big Family

31

Big Data != Big Upfront Payment

Page 32: Big Data for One Big Family

32

Shouldn’t Be Penalized for Success

“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”

IBM Press Release 28 Aug, 2012

Page 33: Big Data for One Big Family

33

Cloud Fosters Experimentation

Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving - if you buy infrastructure it's almost immediately irrelevant to your business because it's frozen in time. It's solving a problem you may not have or care about any more.

- Matt Wood, GM of Data Science, Amazon Web Services

Page 34: Big Data for One Big Family

34

NoDoop: Not Only Hadoop

Source: Silicon Angle, 2012

Page 35: Big Data for One Big Family

35

The Data Scientist Is You

“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop is easier than learning the company’s business.”

(Gartner, 2012)

Page 36: Big Data for One Big Family

@mjasay