15
Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Embed Size (px)

Citation preview

Page 1: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Big Data AnalyticsModule 4 – Data Mining and Predictive Analytics Including Mahout

Saptak Sen, MicrosoftBill Ramos, Advaiya

Page 2: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

• Overview of predictive analytics & data mining

• How Microsoft supports predictive analytics

• How Mahout fits into the picture

• Demos

Agenda

Page 3: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Data Mining

Page 4: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Predicting future performance from historical data

*Source: Ventana Research, Predictive Analytics Benchmark Research Report, March 2012.

Recommenda-tion engines

Advertising analysis

Weather forecasting for business planning

Social network analysis

IT infrastructure and web app optimization

Legal discovery and document archiving

Pricing analysisFraud detection

Churn analysis

Equipment monitoring

Location-based tracking and services

Personalized Insurance

Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later*

Page 5: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Data mining tool in SQL Server Analysis Services

• Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more

• Rich developer experience

Page 6: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Analysis Services Data Mining Algorithms

Classify Estimate Cluster Forecast Associate

• Decision Trees

• Logistic Regression

• Naïve Bayes

• Neural Networks

• Decision Trees

• Linear Regression

• Logistic Regression

• Neural Networks

• Clustering

• Time Series

• Association Rules

• Decision Trees

Page 7: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Data mining add-in for Excel

• Ease of use through Excel

• Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more

• Scalable through integration with SSAS

Page 8: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Algorithms: Data Mining Add-in for Excel

Menu Data Mining

Analyze Key Influencers Naïve Bayes

Detect Categories Clustering

Fill From Example Logical Regression

Forecast Time Series

Highlight Exceptions Clustering

Scenario Analysis – Goal Seek Logical Regression

Scenario Analysis – What If Logical Regression

Prediction Calculator Logical Regression

Shopping Basket Analysis Association Rules

Page 9: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Demo 1: Excel Data Mining Add-In

Windows Azure HDInsight

Microsoft Excel(Mining Add-in)

Microsoft Excel

Excel Data Mining Add-in

Serving LayerSpeed LayerBatch Layer

Flat files (.txt, .dat, .xl

sx, etc.)

Page 10: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Mahout

Page 11: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Mahout

• Scalable machine learning algorithms on Hadoop platform

• Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm

• Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies

Applications

ClusteringRecommendersVector Similarity

PatternMining

Classification

Regression

GeneticDimension Reduction

Matrices

Collocations

Examples

Page 12: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Demo 2: Mahout

Flat files (.txt, .dat, .xl

sx, etc.)

Running Mahout job on Hadoop Command Window to get output

file

Convert to Mahout input

Hadoop Command Window

Output file

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

HDInsight Consoles

Page 14: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Questions?

Page 15: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya