Webinar - Comparative Analysis of Cloud based Machine Learning Platforms

Preview:

Citation preview

ComparativeAnalysisofCloudbasedMachineLearningPlatforms

AmazonML, AzureML,DatabricksCloud

ThirdEyeConsultingServices&SolutionsLLC.thirdeyecss.com|results@thirdeyecss.com|@thirdeyecss|408-462-5257

ThirdEyeConsultingServices&SolutionsLLC.thirdeyecss.com|results@thirdeyecss.com|@thirdeyecss|408-462-5257

DATAANSWERS

Disclaimer

• ThirdEyeisadirectvendortoMicrosoft,Amazon&Google.• ThirdEyehasimplementednumerousBigDataprojectsforthemoverlast3years.

• ThirdEyeisNOTaresellerofthecloudservicesofthesecompanies.

• ThirdEyedoesNOTfinanciallybenefitformakinganyofthefollowingrecommendations.

• ThisworkispurelymeantforatechnicalevaluationoftheMLplatformsandshouldnotbeconstruedforanyotherpurposes.

ComparisonApproach: Whatdowelookforin an onlineMachineLearningPlatform?

DataPreparation

• DataIngestion(outoftheboxsupportofdatasources)&DataExport• DataCleaning,Transformation,Visualization

DataSelection

• Featureselection/engineering

Algorithms

• Whichalgorithmsaresupportedoutthebox?Modifyorcreatenewones?• Saving/comparingresults

Optimize• E.g.Identifytheoptimalparametersettingsforalgorithms

Knowledge

MeettheContestants

• AmazonML• AzureML• DatabricksCloud

AmazonML

• Arelativelylimitedentryintermsofcapabilities/algorithmsoffered. • AppearstargetedatexistingAWScustomerswhowanttodosomebaicMLinevetigationswithoutrequiringsignificantexpertise

AmazonML

• Supportedcapabilitiesaredescribedin UseCasesterminology– asopposedtonamesofalgorithms:

– Frauddetection– ContentPersonalization– DocumentClassification– CustomerChurnPrediction– Relevancymodelingformarketing– Recommendations

AmazonML

• Availablealgorithms:– BinaryandMulticlassClassification– Regression

• Limitedornocustomizability:thealgorithmsarealreadyimplementedandchosenforyou:e.g.BinaryRegressionisimplementedviaLogisticRegression

AmazonML

AvailablePerformanceMetrics

• BinaryAUC:Thebinary MLModel usestheAreaUndertheCurve(AUC)techniquetomeasureperformance.

• RegressionRMSE:Theregression MLModel usestheRootMeanSquareError(RMSE)techniquetomeasureperformance.RMSEmeasuresthedifferencebetweenpredictedandactualvaluesforasinglevariable.

• MulticlassAvgFScore:Themulticlass MLModel usestheF1scoretechniquetomeasureperformance.

AmazonML

• DataIngestion/integration– Thisistheirstrongestusecase:easyintegrationwithAWSstoragemedia• S 3,EBS• RedShift• RDS

AzureML

• IntroducedFebruarythisyear

• Butdonotletitsrelativeyouthfulnessbeadistraction:thisisafeaturerichoffering

• Hasadifferentapproach:amoreserious/richsetofalgorithmsandconfigurationsaremadeavailable .

• Default/cannedalgorithmsarestillavailableforthosenewertoMachineLearning

AzureML

• Morecomprehensiveselectionofrepresentativealgorithms:• Providesmoreselectionsforthealgorithmsaswellastuningknobs

AzureML

• Firstclassusability :– Tutorials– Walkthroughs– Videos– IntegratedDevelopmentEnvironment

• AzureMLStudio

– Documentation

AzureML

• ProcessTools– Selectthedataprocessing,modeling,orpredictionactivitymanually

AzureML

• Orfollowthesuggestedworkflow:

AzureML

• Thewizardsarefieldnamesanddatatypeaware

AzureML

• DataPreparationstages

AzureML

• DataPreparationstages

AzureML

• WorkflowVisualization

AzureML

• ViewPredictionResults

AzureML

• Workflowentriesallowviewing/settingdetailedconfiguration/parameters

AzureML

• Workflowentriesallowad-hocoperations

AzureML

• PointandClickaccesstouseful/popularpublicdatasets

AzureML

• Supportforthepopular"Notebooks"structures

AzureML: More choices

• Regression:– Linear, Bayesian, Neural Network , Decision Forest,Boosted Decision Tree, Poisson

• Binary Classification– SVM, Percepton, LR, Bayes, NN's, Decision forest

• Multiclass : – LR, NN, Decision Forest/ Jungle, One vs All

• Anomaly Detection:– SVM, PCA

• Clustering: Kmeans

AzureML: Available Algorithms

DatabricksCloud• Spark:hasjoinedHadoopasde-factoindustrystandardsfordistributed

computing• Rapidlyapproachingpopularityofhadoop

– Andsupplantingitif/whenorganizationscanmaketheswitch• Databricksisthespin-offofBerkeleyAmplab–theoriginalcreatorsofSpark• DatabricksstaffincludealargefractionoftheSparkcorecommitters• Andanevenlargerproportionofthekeydecisionmakers/"shepherds"

– Includingthespark.ml/mllibshepherds• CloudbasedavailabilityofSparkincludingSparkSQLandspark.ml • AccesstocapabilitiesofSparkMllib,SparkDataframes/SQL,Streaming,and

ResilientDistributedDatasets• Notebooksapproach:Scala,Python,Java,andR

SparkEcosystem

DatabricksCloud

• TheonlineofferingwasannouncedJuly2014 atSparkSummit• Purposestatement - Ease ofworkflowmangementforDataScientists:

DatabricksCloud

• TheDatabrickscloudapproach:Notebooks• R,Python,Java,Scala

DatabricksCloud:Notebooks

• NotebooksareDataScientists'''friends• Astandard/typicallypreferredapproachingtodoingtheirwork

– Experimentwithdata– Performad-hocvisualizations– Communicate/shareresultswithcolleagues– orevenpublishthem

• Widespectrumofsophisticationlevelsavailable: – simplyuseexistinglibraries– developnewalgorithmsfromscratch

DatabricksCloud:Notebooks

Wrap-up/SummaryThreegeneraltypesofapproaches (not mutually exclusive)

PointandClick(aswellasbackendAPI's)AmazonMLAzure ML

APIs-OnlyGooglePredictionAPI

NotebooksAzure MLDatabricksCloud

Wrap-up/Summary

AmazonML maybesufficient for:- customers thatalreadyhavedataresidinginthoseproviders - simpler/fewer optionsare acceptable

AzureMLhasastrongusabilityandworkflowapprochandprovidesareasonablecrosssectionofalgorithmsavailableforcasual &intermediate users

DatabricksCloudhasthemostcomprehensiveoffering– Variety,performance,configurabilityofAlgorithms– RichnessofthecapabilitiesoftheNotebooks– Options/configurabilityofthehostingclusters/environment

THANKYOU!

AskYourQuestionsHerehttp://info.thirdeyecss.com/ask_your_question

ThirdEyeConsultingServices&SolutionsLLC.thirdeyecss.com|results@thirdeyecss.com|@thirdeyecss|408-462-5257

DATAANSWERS

Recommended