Upload
mjasay
View
521
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Presentation by Matt Asay (MongoDB) at the FamilySearch Developer Conference (2014), talking about how big data applies to family history.
Citation preview
MongoDB Inc. Proprietary and Confidential
Big Data for One Big Family
VP, Community, MongoDB Matt Asay
2
What Genealogy Was: Neat and Tidy Data
3
Genealogy = Family Stories
4
Stories Aren’t Told in Spreadsheets
5
They’re Increasingly Told Like This
6
Modern, “Big” Data Is Messy
7
Data Now Looks Like This
8
It Looks Like People
The Big Data Unknown
10
Who’s Embracing Big Data?
Source: Gartner
11
Top Big Data Challenges?
Translation? Most struggle to know what Big Data is, how to manage it and who can manage it
Source: Gartner
12
• More than 90% of today’s data was created in the last 2 years
• Moore’s Law for data: Doubles at regular intervals
Big Data: Volume Matters
13
Big(ger) Is the New Normal
14
Volume Is Not Really the Problem
“Of Gartner's "3Vs" of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and the greatest opportunity.”
- Forrester, 2014
* From Big Data Executive Summary of 50+ execs from F100, gov orgs
What are the primary data issues driving you to consider Big Data?*
Data Variety (68%)
Data Volume (15%)
Other Data (17%)
Diverse, streaming or new data types
Greater than 100TB
Less than 100TB
15
Compounding the Confusion
16
We Hire for Machines but…
Source: Kdnuggets 2014
17
Time to Rethink the Solution
18
NoSQL Born for Unstructured Data
18
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
Log
data
Free
-form
text
Web
or m
obile
co
nten
t
Soc
ial m
edia
dat
a
Geo
spat
ial d
ata
Tran
sact
ions
Mob
ile d
evic
e da
ta
Web
ses
sion
s or
ca
chin
g da
ta
Sen
sor d
ata
Em
ail/d
ocum
ents
Mac
hine
dat
a
Imag
es
Vide
o
Aud
io
NoSQL Data Types (multiples allowed)
Source: Gartner, 2014
Innovation As Iteration
“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison
21
Back in 1970…Cars Were Great!
22
So Were Computers!
23
Including the Relational Database
24
Lots of Great Innovations Since 1970
25
Legacy Data Infrastructure Makes Development Hard
Relational Database
Object Relational Mapping Application
Code XML Config DB Schema
26
And Even Harder To Iterate
New Table
New Table
New Column
Name Pet Phone Email
New Column
3 months later…
27
Scale and Flexibility Drive Choices
27
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
Scalability Schema flexibility Ease of development
Cost Availability of cloud deployment options
What motivated you to use a NoSQL database over traditional alternatives? (multiples allowed)
Source: Gartner, 2014
28
RDBMS
NoSQL Drives Agility
MongoDB
{ _id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin", department : "Marketing",
title : "Product Manager, Web", report_up: "Neray, Graham",
pay_band: “C", benefits : [
{ type : "Health", plan : "PPO Plus" },
{ type : "Dental", plan : "Standard" }
] }
29
Optimize for (Developer) Iteration
1985 2013
Infrastructure Cost
Engineer Cost
30
So…Use Open Source
31
Big Data != Big Upfront Payment
32
Shouldn’t Be Penalized for Success
“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”
IBM Press Release 28 Aug, 2012
33
Cloud Fosters Experimentation
Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving - if you buy infrastructure it's almost immediately irrelevant to your business because it's frozen in time. It's solving a problem you may not have or care about any more.
- Matt Wood, GM of Data Science, Amazon Web Services
34
NoDoop: Not Only Hadoop
Source: Silicon Angle, 2012
35
The Data Scientist Is You
“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop is easier than learning the company’s business.”
(Gartner, 2012)
@mjasay