Machine Learning · 2012-01-03 · What can Machine Learning ... • Compared to say PHP 33 seconds...

Preview:

Citation preview

Machine Learning

Tom Maiaroto@shift8creative

What is Machine Learning?

Algorithms & Approaches

Decision trees 

Random forests 

Artificial neural networks 

 k-NN (nearest neighbour) 

 Naive Bayesian classifier

Algorithms & Approaches

Decision trees 

Random forests 

Artificial neural networks 

 k-NN (nearest neighbour) 

 Naive Bayesian classifier

So could machines one day rulethe earth?

So could machines one day rulethe earth?

 Maybe  (ok probably not)

What can Machine Learning do for Apps?

 Spam filtering

What can Machine Learning do for Apps?

Auto-tagging

What can Machine Learning do for Apps?

All Sorts of Categorization

What can Machine Learning do for Apps?

Sentiment Analysis

Languages Commonly Used

• Javao Java-ML, WEKA, Apache Mahout, many more...

• Pythono NLTK, scikit-learn, PyML, a good deal more...

• C++o libDAI, Armadillo, Orange, tons more...

  

and then some others...

Languages Commonly Used

  

http://www.mloss.org

MongoDB Too!

• Map/Reduce

• Stored JavaScript

• Geo-spatial Indexing

• Replication

Geo-spatial Indexing

Did someone say nearest neighbour?

Geo-spatial Indexing

Did someone say nearest neighbour?

Design geeks, imagine the visualizations...

Replication

• Store massive amounts of data

• Distributed performance benefits

• Dedicated databases for calculations   

All the obvious benefits.

Map/Reduce

It's the brain.

Map/Reduce

It's the brain.

It's not just for aggregation.

Map/Reduce

It's the brain.

It's not just for aggregation.  

 It's faster than you might think.

Map/Reduce

It's the brain.

It's not just for aggregation.  

 It's faster than you might think.

It runs in the database.

Map/Reduce

In the computer...

Example Time!It's simple...Just take this...

Example Time!It's simple...Just take this...

Example Time!

Just kidding...   

Let's Break Down a Naive Bayes Classifier

Classification/Naive BayesTraining the System

Classification/Naive BayesTraining the System

Simple...

$inc

Classification/Naive Bayes

Just Keep Count of Words per Category

Classification/Naive BayesReduce:

Classification/Naive BayesReduce:

Classification/Naive BayesFinalize:

Classification/Naive BayesFinalize:

Classification/Naive BayesCall the Command:

Classification/Naive BayesResults:

Can see total words.

Can also see word counts per category.

Classification/Naive BayesResults:

...and of course the scores per category...cae = arts and entertainment

cs = science...

Classification/Naive Bayes

• Accurate even with little training

• MongoDB on a small VMTook 1.7 seconds

• Compared to say PHP 33 seconds and timed out

• More training data == exponentially fasterthan PHP

Classification/Naive Bayes

• This wasn't even a full map/reduce

• Your mileage will vary based on formula

• You can cache certain values for speed

• Don't forget about stored JavaScript(but use it wisely)

Porter Stemming Algorithm

 Thank You Martin Porter

http://tartarus.org/martin/PorterStemmer

Porter Stemming Algorithm • Exists for nearly every language

• MongoDB will use JavaScript of course

• Decent execution time

Porter Stemming Algorithm • About 2.5x faster than PHP class

• 663x faster than a web browser

Porter Stemming Algorithm • About 2.5x faster than PHP class

• 663x faster than a web browser

• 7x slower than PHP PECL extension

Real World Application

Social Harvest

Analyzes social data from the internet to determine languages spoken, gender, age, sentiment analysis, and categories.  

www.social-harvest.com

Real World Application

Social Harvest

Who doesn't like pie charts?

Follow Tom@shift8creativewww.shift8creative.com

www.social-harvest.com  

www.union-of-rad.com

 Thank You!

Recommended