Upload
innotech
View
324
Download
0
Embed Size (px)
DESCRIPTION
Presented at InnoTech San Antonio 2014. All rights reserved.
Citation preview
4/3/2014
1
By Health Symmetric, Inc.
A next generation introduction to data science and its potential to change business as we know itby David Smith
David [email protected]/in/davidsmithaustin
A next generation introduction to data science and its potential to change business as we know it
4/3/2014
2
The Age of Data
• In the last two years we have generated more data than in the history of mankind
• Data is expected to double in size every two years through 2020, exceeding 40 zettabytes (40 trillion gigabytes)
2020
2012 - 2014
The Beginning –2011 The Economist:
digital information increases10 times/5 years!
Business Problem
More than half of business and IT executives, 56 percent, report they feel overwhelmed by the amount of data their company manages. Many report they are often delayed in making important decisions as a result of too much information. Surprisingly, 62 percent of C‐level respondents – whose time is considered the most valuable in most organizations – report being frequently interrupted by irrelevant incoming data.
4/3/2014
3
Entering the Age of Data
•Data is THE central business asset:• “Data are an organization’s sole, non‐depletable, non‐ degrading, durable asset. Engineered right, data’s value increases over time because the added dimensions of time, geography, and precision.” (Peter Aitken)
•Data generation has changed forever• Instrumentation of All businesses, people, machines
•Data is born digitally and flows constantly• “All things are flowing..”(Heraclitus, 500 BC)
•The past fifteen years have seen extensive investments in business infrastructure, which have improved the ability to collect data throughout the enterprise.
•Virtually every aspect of business is now open to data collection and often even instrumented for data collection: operations, manufacturing, supply‐chain management, customer behavior, marketing campaign performance, workflow procedures, and so on.
•At the same time, information is now widely available on external events such as market trends, industry news, and competitor’s movements.
•This broad availability of data has led to increasing interest in methods for extracting useful information and knowledge from data‐the realm of data science.
6
4/3/2014
4
The Ubiquity of Data Opportunities
• With vast amounts of data now available, companies in almost every industry are focused on exploiting data for competitive advantage.
• In the past, firms could employ teams of statisticians, modelers, and analysts to explore datasets manually, but the volume and variety of data have far outstripped the capacity of manual analysis.
• At the same time, computers have become far more powerful, networking has become ubiquitous, and algorithms have been developed that can connect datasets to enable broader and deeper analyses than previously possible.
• The convergence of these phenomena has given rise to the increasing widespread business application of data science principles and data mining techniques.
7
Emergence of a Fourth Research Paradigm: Data Science
•Thousand years ago –
• Experimental Science
Description of natural phenomena
• Last few hundred years –• Theoretical Science
Newton’s Laws, Maxwell’s Equations…
• Last few decades –• Computational Science
Simulation of complex phenomena
•Today –• Data‐Intensive Science
Scientists overwhelmed with data!
4/3/2014
5
Good News: Big Data is Sexy
9
http://dilbert.com/strips/comic/2012-09-05/
Data Scientist
“Data Scientist”
• Data Scientist: The Sexiest Job of the 21st Century
Harvard Business Review, October 2012
• The “Hot new gig in town”
O’Reilly report
• The next sexy job in next 10 years will be statistician” – Hal Varian, Google Chief Economist
• Geek Chic – Wall Street Journal – new cool kids on campus
• The future belongs to the companies and people that turn data into products
• “The human expertise to capture and analyze big data is both the most expensive and the most constraining factor for most organizations pursuing big data initiatives” – Thomas Davenport
4/3/2014
6
Data Scientist
“Data Scientist”
• The “Hot new gig in town”
O’Reilly report
• Data Scientist: The Sexiest Job of the 21st Century
Harvard Business Review, October 2012
• The next sexy job in next 10 years will be statistician” – Hal Varian, Google Chief Economist
• Geek Chic – Wall Street Journal – new cool kids on campus
• The future belongs to the companies and people that turn data into products
• “The human expertise to capture and analyze big data is both the most expensive and the most constraining factor for most organizations pursuing big data initiatives” – Thomas Davenport
4/3/2014
7
• Interdisciplinary field using techniques and theories from many fields, including math, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.
• Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common.
• Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non‐practitioners.
Defining Data Science
http://en.wikipedia.org/wiki/Data_science
4/3/2014
8
Venn Diagram of Data Scientists
4/3/2014
9
Statistics vs. Data Science
http://blog.revolutionanalytics.com/data‐science/
Business Intelligence vs. Data Science
http://blog.revolutionanalytics.com/data‐science/
4/3/2014
10
Big‐Data
Gartner Hype Cycle for Big Data, 2012
4/3/2014
11
Data Science as a strategic asset
“85% of eBay’s analytic workload is new and unknown. We are architected for the unknown.”
Oliver Ratzesberger, eBay
• Data exploration – data as the new oil The exploration for data, rather than the exploration of dataUncovering pockets of untapped data Processing the whole data set, without sampling
eBay’s Singularity platform combines transactional data with behavioral data, enabled identification of top sellers, driving increased revenue from those sellers
21
Data Science as a strategic asset
“Groupon will not be the first or last organization to compete and win on the power of data. It’s happening everywhere.”
Reid Hoffman and James SlavetGreylock Partners
Data harnessing – data as renewable energyHarnessing naturally occurring data streamsLike harnessing raw energy to be converted into usable energyConversion of raw data into usable data 22
4/3/2014
12
Today most big data is retrospective, why is there a need for real‐time and predictive
Retrospective
Real‐time
Predictive
4/3/2014
13
Today's Cycle
Where is Real Time?
Advance Analytics
•The time to use the output is increasingly getting shorter – Real Time is becoming very common
• Limited available human resources, and performance is often unreliable due to human fatigue and distraction. Therefore, automated real‐time sensor processing techniques are required to reliably detect and discriminate targets of interest
• Limited automated processing and tagging tools
• – Still NOT enough
4/3/2014
14
Evolution of Database Technology
• 1960s:
• Data collection, database creation, IMS and network DBMS
• 1970s:
• Relational data model, relational DBMS implementation
• 1980s:
• RDBMS, advanced data models (extended‐relational, OO, deductive, etc.)
• Application‐oriented DBMS (spatial, scientific, engineering, etc.)
• 1990s:
• Data mining, data warehousing, multimedia databases, and Web databases
• 2000s
• Stream data management and mining
• Data mining and its applications
• Web technology (XML, data integration) and global information systems
Even as clouds and big data take hold, the IT landscape is changing rapidly…
•Technology is rapidly being commoditized
•Businesses are more willing and able to shop for IT services
•In‐house IT infrastructure is increasingly seen as complex and rigid
•Unstructured data is the new gold
© Harvard Business Review
4/3/2014
15
Big Data Numbers
• How many data in the world?
• 800 Terabytes, 2000
• 160 Exabytes, 2006
• 500 Exabytes(Internet), 2009
• 2.7 Zettabytes, 2012
• 35 Zettabytes by 2020
• How many data generated ONE day?
• 7 TB, Twitter
• 10 TB, Facebook Big data: The next frontier for innovation, competition, and productivity
McKinsey Global Institute
4/3/2014
16
4/3/2014
17
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
2003 2004 2005 2006 2007 2008 2009 2010 2011
Year
Petabytes/Day Global
• Mobile • Device to Device • Sensors • Entertainment• Smart Home• Distributed Industrial• Autos/Trucks• Smart Toys
2012
ConvergedContent
Traditional Computation
Growth at the Edge of the Network
Internet of Things
•A system . . . that would be able to instantaneously identify any kind of object.
•Network of objects . .•One major next step in this development of the Internet, which is to progressively evolve from a network of interconnected computers to a network of interconnected objects …
•From communicating people (Internet)
... to communicating items …
• From human triggered communication …
... to event triggered communication
4/3/2014
18
Internet of Things and the Cloud
• It is projected that there will be 24 billion devices on the Internet by 2020. Most will be small sensors that send streams of information into the cloud where it will be processed and integrated with other streams and turned into knowledge that will help our lives in a multitude of small and big ways.
• The cloud will become increasing important as a controller of and resource provider for the Internet of Things.
• As well as today’s use for smart phone and gaming console support, “Intelligent River” “smart homes” and “ubiquitous cities” build on this vision and we could expect a growth in cloud supported/controlled robotics.
• Some of these “things” will be supporting science
• Natural parallelism over “things”
• “Things” are distributed and so form a Grid
35
Data available from “Internet of Things”
4/3/2014
19
Sensors (Things) as a Service
Sensors as a Service
Sensor Processing as a Service (could
useMapReduce)
A larger sensor ………
Output Sensor
https://sites.google.com/site/opensourceiotcloud/ Open Source Sensor (IoT) Cloud
4/3/2014
20
Tapping into the Data
• Data Storage• Reporting• Analytics• Advanced Analytics
– Computing with big datasetsis a fundamentally different challenge than doing “big compute” over a small dataset
Unutilized data that can be available to business
Utilized data
4/3/2014
21
Business, Knowledge, and Innovation Landscape
• Typically 80% of the key knowledge (and value) is held by 20% of the people – we need to get it to the right people
• Only 20% of the knowledge in an organization is typically used (the rest being undiscovered or under‐utilized)
• 80‐90% of the products and services today will be obsolete in 10 years – companies need to innovate & invent faster
4/3/2014
22
“Big Data” and it’s close relatives “Cloud Computing”, “Social Media” and
"Mobile"
are the new frontier of innovation.
Driven by Data Science andAdvance Analytics
VolumeVarietyVelocity………..
4/3/2014
23
Volume
Volume is increasing at incredible rates. With more people using high speed internet connections than ever, plus these people becoming more proficient at creating content and just more people in general contributing information are combined forces that are causing this tremendous increase in Volume.
Variety
Next in breaking down Big Data into easily digestible bite‐size chunks is the concept of Variety. Take your personal experience and think about how much information you create and contribute in your daily routine. Your voicemails, your e‐mails, your file shares, your TV viewing habits, your Facebook updates, your LinkedIn activity, your credit card transactions, etc.
Whether you consciously think about it or not the Variety of information you personally create on a daily basis which is being collected and analyzed is simply overwhelming.
4/3/2014
24
Velocity
The speed at which data enters organizations these days is absolutely amazing. With mega internet bandwidth nearly being common place anymore in conjunction with the proliferation of mobile devices, this simply gives people more opportunity than ever to contribute content to storage systems.
CRM Data
GPS
Demand
Speed
Velocity
Transactions
Opportunities
Service C
alls
Customer
Sales Orders
Inventory
Emails
Tweets
Planning
Things
Mobile
Instan
t Messages
Worldwide digital content will double in 18 months, and every 18 months thereafter.
VELOCITY
In 2005, humankind created 150 exabytes of information. In 2011, over 1,200 exabytes was created.
VOLUME VARIETY
80% of enterprise data will be unstructured, spanning traditional and non traditional sources.
4/3/2014
25
But I Believe These are the Real Four
4/3/2014
26
What matters when dealing with Data Science?
ScalabilityScalability
StreamingStreaming
ContextContext
QualityQuality
UsageUsage
As the world gets smarter, infrastructure demands will grow
Smart traffic systems
Smart water management
Smart energy grids
Smart healthcare
Smart food
systems Smart oil field technologies
Smart regions
Smart weather
Smart countries
Smart supply chains
Smart cities
Smart retail
4/3/2014
27
.
Mobile Devices
•Mobile computers:–Mainly smartphones, tablets• Sensors: GPS, camera, accelerometer, etc.
• Computation: powerful CPUs (≥ 1 GHz, multi‐core)
• Communication: cellular/4G, Wi‐Fi, near field communication (NFC), etc.
•Many connect to cellular networks: billing system
• Cisco: 7 billion mobile devices will have been sold by 2012
Organization
4/3/2014
28
4/3/2014
29
Plethora of “Big Data” related tools
4/3/2014
30
• Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.
• When data is processed, organized, structured or presented in a given context so as to make it useful.
What we have / What we want
Data verses Information verses Action
Real TimeEmbedded Analytics
CreateAnalyticalModels
DeployAnalyticalModels
Alerts,Notifications or
Recommendations
Modificationsto Workflow
Clinical Financial Operational
PatientPersonal Data
PhysicianOfficeData
HospitalData
HospitalSystemData
RegionalData
StatewideData
NationalData
WebData
SocialCare Data Science Analytics
Web
Models
SocialCare Confidential and Proprietary
InvestigativeAnalytics
4/3/2014
31
identity created_at
updated_at
external_id_hash
idx_1 idx_2 data
partice_identity patient_identity created_at
updated_at
mrn
patients
practice_patients
identity practice_identity patient_identity
patient_soap_notes
identity name settings address phone deleted created_at
updated_at
roles_and_permissions
symptoms practice_type
practice_sub_type
customization
practices
Some Existing SocialCare Beta Relations
patient_identity classifier signature created_by
updated_by
created_epoch
updated_epoch
data
patient_data_store
JSON data stored in thisfield as an array.No Postgres queries possible:• Name• Address• Etc.
JSON data stored in thisfield as an array.No Postgres queries possible:• Allergies• SOAP Notes• Medications• Etc.
Patient #6
Physician #1
Practice #1 Practice #2
Practice #3
Clinical Quality Measures #1
Xray #1Logical ID = 1Version ID = 3
Physician #3
Lab #1
Observation #1
Physician #2
SOAP Note #1
Continuity Of Care #1
Continuity Of Care #2
ExportCCD
ImportCCD
Hospital #1
Is Primary CarePhysician For
Had Test
Works In
Has Sub‐practiceHas Sub‐practice
Work In
Has Quality Measure
Associated With
Document Store
Made Observation
Had Observation
AnnotatedDocument
Xray #1Logical ID = 1Version ID = 2
Xray #1Logical ID = 1Version ID = 1
Patient #9(Remote)
PatientRegistry
Lab Request #7
Lab Response #8
ProviderRegistry
Requestor
SubjectResponseFor
Source
Physician #10(Remote)
IncomingReferral
OutgoingReferral
MadeReferral
ReceivedReferral
Patient #3
Subject ReceivedReferral
Subject
MadeReferral
SocialCare Example Objects and Relationships
4/3/2014
32
Conclusion•The Age of Data is here •Data is the central business asset •Data generation has changed forever
• The World is moving to Real Time• Data Science is the Key
•Your legacy analytic software WILL fail in the Age of Data
•Crisis of software that scales to meet demand • Advanced Analytics Must be embedded in the
collectors and sensors•Think about where the data comes from•Attempt to capture and analyze any data that might be relevant, regardless of where it resides
•Data Science is changing how data is: • Collected, discovered, analyzed, used, acted upon …