17
Dr. Prasad A. Naik, Professor, UC Davis Challenges and Opportunities in Big Data Analytics @iValleyIC #FinTechTalk http://ivalley.co http://gsm.ucdavis.edu/faculty/prasad-naik

Challenges and Opportunities in Big Data Analytics · Challenges and Opportunities in Big Data Analytics ... UC Davis Launches Master of Science in Business Analytics MSBA ... soft

Embed Size (px)

Citation preview

Dr. Prasad A. Naik, Professor, UC Davis

Challenges and Opportunities in Big Data Analytics

@iValleyIC #FinTechTalk

http://ivalley.co

http://gsm.ucdavis.edu/faculty/prasad-naik

Volume Variety VelocityBig

Data

N

p

p exceeds N

N

p

Data Matrix Grows

Theoretically “Big Data” means …

Standard Theory

• Sample Size N Infinity

• Number of variables p fixed

• Ratio p/N becomes negligible

• Result?

– Tall data matrix

– p < N

Big Data Theory

• Sample size N Infinity

• Variables p Infinity at a faster rate than N does

• Ratio p/N remains “Big”

• Result?

– Long data matrix

– p > N

Big Data?

Standard Data

• Data Matrix (Tall)

• Large N, Smaller p

– p < N

Big Data

• Data Matrix (Long)

• Large N, but Larger p

– p > N

Got Big Data,But where are my Big Insights?

Two Challenges

• When N Large, but p < N

o Computational challenges

• Storage, retrieval

• parallel computing, real-time analysis

• When N Large, but p > N

o Statistical challenges

• All standard methods break down!

Need New Analytics for Big Data

Linear Regression

Logistic Regression

Principal Components

Factor Analysis

Don’t work when p > N

Opportunity: Sparsity constraints pave the way

Standard Techniques

Big Data Analytics

Sparse Analytics for Big Data

Linear Regression

Logistic Regression

Principal Components

Don’t work when p > N

Lasso Regression

Sparse Logistic

Sparse PCA

Works even when p > N

How to instill sparsity? Many ways …

Elastic Net Penalty

Lasso Penalty

Two Marketing Applications

• What drives charisma of CEOs and Founders?

• What drives liking for Super Bowl Ads?

Impact of Nonverbal Communication on Charisma of CEOs/Founders

N = 22 sales pitches 1-minute long

p = 100+ variables

Mine the gestures

Shoot videos

p/N ratio = 5X

Takeaways

• Big Data needs Sparse Analytics

o It’s not the size -- it’s the relative size

• When p/N < 1, usual statistical tools work

• When p/N > 1, sparsity needs to be incorporated

• Many ways to incorporate sparsity

o Lasso, Elastic Net

o Depends on the goals of the project

• These methods work in finance too, not just marketing!

UC Davis Launches Master of Science in Business Analytics

MSBA Program

• Starts in Fall 2017

• 10-month or 19-month

• Equal emphasis on hard and soft skills

– Hard: Data + Analytics

– Soft: Business + Practicum

You can help!

• Encourage your smart employees to enroll

• Contribute real data-driven projects for student-teams to tackle

• Be a guest speaker or donor

http://gsm.ucdavis.edu/msba-masters-science-business-analytics

Questions?

Contact me [email protected]