View
2.762
Download
15
Category
Tags:
Preview:
DESCRIPTION
View Dr. Robin Bloor's presentation from the Dec. 2013 Big Data Conference in Rome.
Citation preview
Analytics and Big
Data AnalyticsRobin Bloor Ph D
The Sequence of Topics….
1 Data Science?2 The Nature of
Analytics3 Machine Learning
Et Al4 The Business
Perspective5 The Future
1
What Is Data Science?
There is no “data science.” It’s a misnomer
All science is empirical and involves data analysis.
Science implements a method.
So do statisticians
What Is A Data Scientist?
Project managerQualified
statisticianDomain Business
expertExperienced data
architectSoftware engineer
(It’s a team)
Data Scientist v Business Analysts
Claims that business analysts can be data scientists are dubious
Good practitioners of statistics understand data (from years of training)
Software understands nothing, it simply implements algorithms
Who Understands Data?
Nevertheless!
You can know more about a
business from its data than by any
other means
2
TheNatureOfAnalytics
The Field of Business Intelligence
The Driving Force is Insight
A Process Not An Activity
Data Analytics is a multi-disciplinary end-to-end process
Until recently it was a walled-garden. But recently the walls were torn down by… Data availability Scalable technology Open source tools
The Data Analytics Process - Detail
The CRITICAL Workload Issue
Previously, we viewed database workloads as an i/o optimization problem
With analytics the workload is a very variable mix of i/o and calculation
No databases were built for this – not even Big Data databases
3
MachineLearning
Et Al
Analytical Latencies
1 Data access
2 Data preparation
3 Model development
4 Execution
5 Implementation
6 Model Audit & Update
Speed = value (probably)
The Open Source Dynamic
The R Language Over 1 million
users Hadoop and its
Ecosystem Reduced latency
for analytics Machine Learning
Algorithms Raw power
None of these are engineered for performance
Machine Learning Algorithms - 1
There are many: Neural
network(s) Bayesian
networks Decisions
trees/random forests
Support vector machines
K-means Clustering Regression(s) Etc.
Machine Learning Algorithms - 2
They are not newly invented
We did not previously use them much because we never had the computer power
Now that we have the power (at a price) we can employ them
Machine Learning Algorithms - 3
Machine learning algorithms can check all possibilities
We never had the computer power
Now that we have the power (at a price) we can employ them
The Impact?
Machine learning and processing power (parallelism) will change the data analysis process
The analytics team needs to understand IT
4
TheBusinessPerspective
Business Metamorphosis
The role of data analysis has not changed
Only the speed has changed
The process will evolve
It will be disruptive for incumbent vendors
The Data Analysis Budget
Data Analysis is Business R&D
The focus is on business process
The outcome of successful R&D is a changed process
Think of manufacturing for a useful analogy
The Data Analysis Budget
Data Analysis is Business R&D
The focus is on business process
The outcome of successful R&D is a changed process
Think of manufacturing for a useful analogy
5
TheFuture
Non è finita fino a quando la signora grassa canta
Hardware disruption Software disruption Business process
disruption All we know is:
Analytical processing will get faster
Analytic latencies will reduce
Data will continue to grow
Analytics will be a differentiator
In Summary…
1 Data Science?2 The Nature of
Analytics3 Machine Learning
Et Al4 The Business
Perspective5 The Future
Grazie milleper la vostra attenzione
Recommended