Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Big Data: Characteristics
• Volume• Velocity• Variety• Veracity• Value• …• Vancouver
J. Pei: Big Data Analytics -- Introduction 2
J. Pei: Connecting Big Data with Many People 3
What Is Big Data?
• “Big data is like teenage sex – everyone talks about it, – nobody really knows how to do it,
everyone thinks everyone else is doing it
J. Pei: Big Data Analytics -- Introduction
– everyone thinks everyone else is doing it, – so everyone claims they are doing it...”
– Dan Ariely
4
Is Big Data Really New?
• (Genesis 28:15) “I am with you and will watch over you wherever you go”
• “密室私语,天闻如雷;暗室欺心,神目如电;善恶之报 如影随行”;善恶之报,如影随行
• Similar statements in Quran and Sutra• People were aware of the existence of big
data long time ago, but no one could access it until very recently
J. Pei: Big Data Analytics -- Introduction 5
Installation of Pope Benedict, 2005
J. Pei: Connecting Big Data with Many People 6
Source: http://enterprise-it-architecture.blogspot.ca/2013/05/dinner-and-little-disruption.html
Installation of Pope Francis, 2013
J. Pei: Connecting Big Data with Many People 7
Source: http://enterprise-it-architecture.blogspot.ca/2013/05/dinner-and-little-disruption.html
Big Data Is About …
• Volume• Velocity• Variety• Veracity• Value
• Connecting data with peopleg p p– Accessibility
• Second use of data• Connecting data from multiple sources
– Capability of analyzing big data• Background knowledge• Context-aware analysis• Interactive analysis
J. Pei: Big Data Analytics -- Introduction 8
J. Pei: Connecting Big Data with Many People 9© 2011 Virgin Group
10J. Pei: Big Data Analytics -- Introduction
Tools for Big Data Analytics• Statistics – how to make statistics scalable?
– Data volume– Changes – Complexity
• Computation – how to make computation based on scientific principles?– Justifiable and reliable inferences
“If you torture the data long enough, it will confess” – Ronald H. Coase
J. Pei: Big Data Analytics -- Introduction 11
What Is This Course About?
• Some computational statistics tools that may be useful in big data analytics
• Investigate some recent case studies on how to use computational statistics tools inhow to use computational statistics tools in big data analytics
• Identify some interesting problems that may be approachable by computational statistics tools
J. Pei: Big Data Analytics -- Introduction 12
Content – Sampling Techniques• Basic ideas• Simple random sampling• Confidence intervals• Estimating proportions and subpopulationEstimating proportions and subpopulation
means• Unequal probability sampling• Core ideas
– Sampling size– Confidence levels– Use of unbiased estimators
J. Pei: Big Data Analytics -- Introduction 13
Content – Computational Statistics
• Basics in optimization methods• Combinatorial optimization• EM optimization• Simulation and MC integration• MCMC• Bootstrapping• Nonparametric density estimation
J. Pei: Big Data Analytics -- Introduction 14
Big Data Is the Message!
• “The medium is the message. … in the long run a medium’s content matters less than the medium itself in influencing howmedium itself in influencing how we think and act.”
– (Herbert) Marshall McLuhan
• Big data is a current medium
J. Pei: Big Data Analytics -- Introduction 15