19
Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th , 2014

Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Embed Size (px)

Citation preview

Page 1: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Data Analysis and

VisualizationDr. Frank van Ham, IBM Netherlands

Target Conference 2014, GroningenNov 4th, 2014

Page 2: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

[insert obligatory ‘Big Data’ slide here]

“No, no let’s not throw that away. I might need that in the future”

“Hyper intelligent computer systems crunching mega giga tera exa lots of bytes of data”

Page 3: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Analytic algorithms to the rescue!

Page 4: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Our world in 20 years?

Page 5: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Descriptive statistics don’t always tell us everything about data

= 7.5, 2 = 4.12, correlation = 0.81 and regression : y = 3 + 0.5x

Page 6: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Interpreting statistics is not a simple task for automated systems.

Page 7: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Analytic results should be used with care and supervision

“A computer systems let you make more mistakes faster than any invention in human history –

with the possible exception of handguns and Tequila.”

(Mitch Radcliffe)

Page 8: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Big Data and Big Data analytics schematized

Real worldBig Data

world

Analytic Systems

(Statistics / Heuristics)

Measure

ComputeHuman “Model”

Simplifiedmachine “Model”

Influence

Verify / Monitor

Influence

Page 9: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

“The lame leading the blind” – J. Turcan

Humans are slow at computing statistics, but fast at contextualizing (though not

necessarily good).

+

Computers are bad at grasping context, but very fast at computing statistics.

=

Humans can lead computers in the right direction, with computers doing the “heavy

lifting”.

Page 10: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

To work with our data reliably, we need to understand it

But unfortunately our data is inside a computer system….

Page 11: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

To understand our data, we need to see our data

Page 12: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

20cm

Visualization is not a cure all magic technology that allows humans to instantly understand data…

Visualization is a medium to bridge the “last 50 cm” in data analysis.

50cm

Page 13: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Industry data tools trends : From Reporting to User-Driven analytics

Drive analytics

User

Algorithm results

Data warehouse

Data warehouse (Daily) Report User

Data warehouseReal-time

on demand ReportUser

Past

Current

Future Visual Interface

Analytics

Analytics

Page 14: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Industry data tools trends : from IT to Line of Business user

Page 15: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

21st century Big Data BI will require tools that

• Deal better with our data– Connect to data transparently in whatever form– Can mash different data sources together intelligently– Automatically clean and model our data where appropriate

• Deal better with us– Are simple and flexible to query– Communicate with us in human friendly ways– Are smart enough to use best (business) practices

• Make analytics accessible to everyone– Act as analytics based guides in our data– Allow non-expert users work with analytic algorithms.– Turn analytics into actionable insights

Page 16: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

IBM Watson Analytics – IBM’s push into this area

Visit http://watsonanalytics.com and sign up for our free beta!

Page 17: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

In summary

• To realize the possibilities of Big Data we need both– Scalable infrastructure.– Tools that allow us to make sense of all this data.

• Visualization and analytic algorithms are essential for data analysis.– One does the heavy lifting– One tells us where we’re going.

• Research/design problems to target, from a business perspective– Data-generic data visualization tools– Simplifying statistics output– Different input modalities– Pluggable analytic algorithms

Page 18: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Please Note

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Page 19: Data Analysis and Visualization Dr. Frank van Ham, IBM Netherlands Target Conference 2014, Groningen Nov 4 th, 2014

Thank you!Questions? Remarks?

[email protected]