32
4/3/2014 1 By Health Symmetric, Inc. A next generation introduction to data science and its potential to change business as we know it by David Smith David Smith President [email protected] linkedin.com/in/davidsmithaustin A next generation introduction to data science and its potential to change business as we know it

A next generation introduction to data science and its potential to change business as we know it

Embed Size (px)

DESCRIPTION

Presented at InnoTech San Antonio 2014. All rights reserved.

Citation preview

Page 1: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

1

By Health Symmetric, Inc.

A next generation introduction to data science and its potential to change business as we know itby David Smith

David [email protected]/in/davidsmithaustin

A next generation introduction to data science and its potential to change business as we know it

Page 2: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

2

The Age of Data

• In the last two years we have generated more data than in the history of mankind

• Data is expected to double in size every two years through 2020, exceeding 40 zettabytes (40 trillion gigabytes)

2020

2012 - 2014

The Beginning –2011 The Economist: 

digital information increases10 times/5 years!

Business Problem

More than half of business and IT executives, 56 percent, report they feel overwhelmed by the amount of data their company manages. Many report they are often delayed in making important decisions as a result of too much information. Surprisingly, 62 percent of C‐level respondents – whose time is considered the most valuable in most organizations – report being frequently interrupted by irrelevant incoming data. 

Page 3: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

3

Entering the Age of Data

•Data is THE central business asset:• “Data are an organization’s sole, non‐depletable, non‐ degrading, durable asset. Engineered right, data’s value increases over time because the added dimensions of time, geography, and precision.” (Peter Aitken)

•Data generation has changed forever• Instrumentation of All businesses, people, machines

•Data is born digitally and flows constantly• “All things are flowing..”(Heraclitus, 500 BC)

•The past fifteen years have seen extensive investments in business infrastructure, which have improved the ability to collect data throughout the enterprise.

•Virtually every aspect of business is now open to data collection and often even instrumented for data collection: operations, manufacturing, supply‐chain management, customer behavior, marketing campaign performance, workflow procedures, and so on.

•At the same time, information is now widely available on external events such as market trends, industry news, and competitor’s movements. 

•This broad availability of data has led to increasing interest in methods for extracting useful information and knowledge from data‐the realm of data science.

6

Page 4: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

4

The Ubiquity of Data Opportunities 

• With vast amounts of data now available, companies in almost every industry are focused on exploiting data for competitive advantage.

• In the past, firms could employ teams of statisticians, modelers, and analysts to explore datasets manually, but the volume and variety of data have far outstripped the capacity of manual analysis.

• At the same time, computers have become far more powerful, networking has become ubiquitous, and algorithms have been developed that can connect datasets to enable broader and deeper analyses than previously possible.

• The convergence of these phenomena has given rise to the increasing widespread business application of data science principles and data mining techniques.

7

Emergence of a Fourth Research Paradigm: Data Science

•Thousand years ago –

• Experimental Science

Description of natural phenomena

• Last few hundred years –• Theoretical Science

Newton’s Laws, Maxwell’s Equations…

• Last few decades –• Computational Science

Simulation of complex phenomena

•Today –• Data‐Intensive Science

Scientists overwhelmed with data!

Page 5: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

5

Good News: Big Data is Sexy

9

http://dilbert.com/strips/comic/2012-09-05/

Data Scientist

“Data Scientist”

• Data Scientist: The Sexiest Job of the 21st Century

Harvard Business Review, October 2012

• The “Hot new gig in town”

O’Reilly report

•  The next sexy job in next 10 years will be statistician” – Hal Varian, Google Chief Economist

•  Geek Chic – Wall Street Journal – new cool kids on campus

•  The future belongs to the companies and people that turn data into products

•  “The human expertise to capture and analyze big data is both the most expensive and the most constraining factor for most organizations pursuing big data initiatives” – Thomas Davenport

Page 6: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

6

Data Scientist

“Data Scientist”

•  The “Hot new gig in town”

O’Reilly report

•  Data Scientist: The Sexiest Job of the 21st Century

Harvard Business Review, October 2012

•  The next sexy job in next 10 years will be statistician” – Hal Varian, Google Chief Economist

•  Geek Chic – Wall Street Journal – new cool kids on campus

•  The future belongs to the companies and people that turn data into products

•  “The human expertise to capture and analyze big data is both the most expensive and the most constraining factor for most organizations pursuing big data initiatives” – Thomas Davenport

Page 7: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

7

• Interdisciplinary field using techniques and theories from many fields, including math, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.

• Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common. 

• Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non‐practitioners.

Defining Data Science

http://en.wikipedia.org/wiki/Data_science

Page 8: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

8

Venn Diagram of Data Scientists

Page 9: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

9

Statistics vs. Data Science

http://blog.revolutionanalytics.com/data‐science/

Business Intelligence vs. Data Science

http://blog.revolutionanalytics.com/data‐science/

Page 10: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

10

Big‐Data

Gartner Hype Cycle for Big Data, 2012

Page 11: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

11

Data Science as a strategic asset

“85% of eBay’s analytic workload is new and unknown. We are architected for the unknown.”  

Oliver Ratzesberger, eBay

• Data exploration – data as the new oil The exploration for data, rather than the exploration of dataUncovering pockets of untapped data Processing the whole data set, without sampling

eBay’s Singularity platform combines transactional data with behavioral data, enabled identification of top sellers, driving increased revenue from those sellers

21

Data Science as a strategic asset

“Groupon will not be the first or last organization to compete and win on the power of data. It’s happening everywhere.”  

Reid Hoffman and James SlavetGreylock Partners

Data harnessing – data as renewable energyHarnessing naturally occurring data streamsLike harnessing raw energy to be converted into usable energyConversion of raw data into usable data  22

Page 12: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

12

Today most big data is retrospective, why is there a need for real‐time and predictive

Retrospective

Real‐time

Predictive

Page 13: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

13

Today's Cycle

Where is Real Time?

Advance Analytics

•The time to use the output is increasingly getting shorter – Real Time is becoming very common

• Limited available human resources, and performance is often unreliable due to human fatigue and distraction. Therefore, automated real‐time sensor processing techniques are required to reliably detect and discriminate targets of interest

• Limited automated processing and tagging tools

• – Still NOT enough

Page 14: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

14

Evolution of Database Technology

• 1960s:

• Data collection, database creation, IMS and network DBMS

• 1970s: 

• Relational data model, relational DBMS implementation

• 1980s: 

• RDBMS, advanced data models (extended‐relational, OO, deductive, etc.) 

• Application‐oriented DBMS (spatial, scientific, engineering, etc.)

• 1990s: 

• Data mining, data warehousing, multimedia databases, and Web databases

• 2000s

• Stream data management and mining

• Data mining and its applications

• Web technology (XML, data integration) and global information systems

Even as clouds and big data take hold, the IT landscape is changing rapidly…

•Technology is rapidly being commoditized

•Businesses are more willing and able to shop for IT services

•In‐house IT infrastructure is increasingly seen as complex and rigid

•Unstructured data is the new gold

© Harvard Business Review

Page 15: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

15

Big Data Numbers

• How many data in the world?

• 800 Terabytes, 2000

• 160 Exabytes, 2006

• 500 Exabytes(Internet), 2009

• 2.7 Zettabytes, 2012

• 35 Zettabytes by 2020

• How many data generated ONE day?

• 7 TB, Twitter

• 10 TB, Facebook Big data: The next frontier for innovation, competition, and productivity

McKinsey Global Institute

Page 16: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

16

Page 17: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

17

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

2003 2004 2005 2006 2007 2008 2009 2010 2011

Year

Petabytes/Day Global

• Mobile • Device to Device • Sensors • Entertainment• Smart Home• Distributed Industrial• Autos/Trucks• Smart Toys

2012

ConvergedContent

Traditional Computation

Growth at the Edge of the Network

Internet of Things 

•A system . . . that would be able to instantaneously identify any kind of object.

•Network of objects . .•One major next step in this development of the Internet, which is to progressively evolve from a network of interconnected computers to a network of interconnected objects … 

•From communicating people (Internet) 

... to communicating items  …

• From human triggered communication …

...  to event triggered communication

Page 18: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

18

Internet of Things and the Cloud 

• It is projected that there will be 24 billion devices on the Internet by 2020.  Most will be small sensors that send streams of information into the cloud where it will be processed and integrated with other streams and turned into knowledge that will help our lives in a multitude of small and big ways.  

• The cloud will become increasing important as a controller of and resource provider for the Internet of Things. 

• As well as today’s use for smart phone and gaming console support, “Intelligent River” “smart homes” and “ubiquitous cities” build on this vision and we could expect a growth in cloud supported/controlled robotics.

• Some of these “things” will be supporting science

• Natural parallelism over “things”

• “Things” are distributed and so form a Grid

35

Data available from “Internet of Things”

Page 19: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

19

Sensors (Things) as a Service

Sensors as a Service

Sensor Processing as a Service (could 

useMapReduce)

A larger sensor ………

Output Sensor

https://sites.google.com/site/opensourceiotcloud/ Open Source Sensor (IoT) Cloud

Page 20: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

20

Tapping into the Data

• Data Storage• Reporting• Analytics• Advanced Analytics

– Computing with big datasetsis a fundamentally different challenge than doing “big compute” over a small dataset

Unutilized data that can be available to business

Utilized data

Page 21: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

21

Business, Knowledge, and Innovation Landscape

• Typically 80% of the key knowledge (and value) is held by 20% of the people – we need to get it to the right people

• Only 20% of the knowledge in an organization is typically used (the rest being undiscovered or under‐utilized)

• 80‐90% of the products and services today will be obsolete in 10 years – companies need to innovate & invent faster

Page 22: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

22

“Big Data” and it’s close relatives “Cloud Computing”, “Social Media” and 

"Mobile" 

are the new frontier of innovation.

Driven by Data Science andAdvance Analytics

VolumeVarietyVelocity………..

Page 23: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

23

Volume

Volume is increasing at incredible rates. With more people using high speed internet connections than ever, plus these people becoming more proficient at creating content and just more people in general contributing information are combined forces that are causing this tremendous increase in Volume. 

Variety

Next in breaking down Big Data into easily digestible bite‐size chunks is the concept of Variety. Take your personal experience and think about how much information you create and contribute in your daily routine. Your voicemails, your e‐mails, your file shares, your TV viewing habits, your Facebook updates, your LinkedIn activity, your credit card transactions, etc. 

Whether you consciously think about it or not the Variety of information you personally create on a daily basis which is being collected and analyzed is simply overwhelming. 

Page 24: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

24

Velocity

The speed at which data enters organizations these days is absolutely amazing. With mega internet bandwidth nearly being common place anymore in conjunction with the proliferation of mobile devices, this simply gives people more opportunity than ever to contribute content to storage systems. 

CRM Data

GPS

Demand

Speed

Velocity

Transactions

Opportunities

Service C

alls

Customer

Sales Orders

Inventory

Emails

Tweets

Planning

Things

Mobile

Instan

t Messages

Worldwide digital content will double in 18 months, and every 18 months thereafter.  

VELOCITY

In 2005, humankind created 150 exabytes of information.  In 2011, over 1,200 exabytes was created.

VOLUME VARIETY

80% of enterprise data will be unstructured, spanning traditional and non traditional sources.

Page 25: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

25

But I Believe These are the Real Four

Page 26: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

26

What matters when dealing with Data Science?

ScalabilityScalability

StreamingStreaming

ContextContext

QualityQuality

UsageUsage

As the world gets smarter, infrastructure demands will grow

Smart traffic  systems 

Smart water management 

Smart energy grids

Smart healthcare

Smart food 

systems Smart oil field technologies 

Smart regions

Smart weather 

Smart countries

Smart supply chains 

Smart cities

Smart retail

Page 27: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

27

.

Mobile Devices

•Mobile computers:–Mainly smartphones, tablets• Sensors: GPS, camera, accelerometer, etc.

• Computation: powerful CPUs (≥ 1 GHz, multi‐core)

• Communication: cellular/4G, Wi‐Fi, near field communication (NFC), etc.

•Many connect to cellular networks: billing system

• Cisco: 7 billion mobile devices will have been sold by 2012

Organization

Page 28: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

28

Page 29: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

29

Plethora of “Big Data” related tools

Page 30: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

30

• Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized. 

• When data is processed, organized, structured or presented in a given context so as to make it useful.

What we have / What we want

Data verses Information verses Action

Real TimeEmbedded Analytics

CreateAnalyticalModels

DeployAnalyticalModels

Alerts,Notifications or

Recommendations

Modificationsto Workflow

Clinical Financial Operational

PatientPersonal Data

PhysicianOfficeData

HospitalData

HospitalSystemData

RegionalData

StatewideData

NationalData

WebData

SocialCare Data Science Analytics

Web

Models

SocialCare Confidential and Proprietary

InvestigativeAnalytics

Page 31: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

31

identity created_at

updated_at

external_id_hash

idx_1 idx_2 data

partice_identity patient_identity created_at

updated_at

mrn

patients

practice_patients

identity practice_identity patient_identity

patient_soap_notes

identity name settings address phone deleted created_at

updated_at

roles_and_permissions

symptoms practice_type

practice_sub_type

customization

practices

Some Existing SocialCare Beta Relations

patient_identity classifier signature created_by

updated_by

created_epoch

updated_epoch

data

patient_data_store

JSON data stored in thisfield as an array.No Postgres queries possible:• Name• Address• Etc.

JSON data stored in thisfield as an array.No Postgres queries possible:• Allergies• SOAP Notes• Medications• Etc.

Patient #6

Physician #1

Practice #1 Practice #2

Practice #3

Clinical Quality Measures #1

Xray #1Logical ID = 1Version ID = 3

Physician #3

Lab #1

Observation #1

Physician #2

SOAP Note #1

Continuity Of Care #1

Continuity Of Care #2

ExportCCD

ImportCCD

Hospital #1

Is Primary CarePhysician For

Had Test

Works In

Has Sub‐practiceHas Sub‐practice

Work In

Has Quality Measure

Associated With

Document Store

Made Observation

Had Observation

AnnotatedDocument

Xray #1Logical ID = 1Version ID = 2

Xray #1Logical ID = 1Version ID = 1

Patient #9(Remote)

PatientRegistry

Lab Request #7

Lab Response #8

ProviderRegistry

Requestor

SubjectResponseFor

Source

Physician #10(Remote)

IncomingReferral

OutgoingReferral

MadeReferral

ReceivedReferral

Patient #3

Subject ReceivedReferral

Subject

MadeReferral

SocialCare Example Objects and Relationships 

Page 32: A next generation introduction to data science and its potential to change business as we know it

4/3/2014

32

Conclusion•The Age of Data is here •Data is the central business asset •Data generation has changed forever 

• The World is moving to Real Time• Data Science is the Key

•Your legacy analytic software WILL fail in the Age of Data 

•Crisis of software that scales to meet demand • Advanced Analytics Must be embedded in the 

collectors and sensors•Think about where the data comes from•Attempt to capture and analyze any data that might be relevant, regardless of where it resides

•Data Science is changing how data is: • Collected, discovered, analyzed, used, acted upon …