32
Introduction to Microsoft R Open David Smith R Community Lead

Introduction to Microsoft R Open

Embed Size (px)

Citation preview

Page 1: Introduction to Microsoft R Open

Introduction to Microsoft R OpenDavid SmithR Community Lead

Page 2: Introduction to Microsoft R Open

January 28, 2015 — Welcome!

What is R?Applications of RMicrosoft R OpenDemoQ&A

David SmithR Community LeadMicrosoft@revodavid

Editor, Revolutions blogblog.revolutionanalytics.com

Co-author (with Bill Venables and R Core Team), An Introduction to Rcran.r-project.org/manuals.html

Page 3: Introduction to Microsoft R Open

Poll 1:Which statement best matches your relationship to R? - I’m completely new to R, but want to learn- I’m learning R- I’m an experienced R user- I won’t be using R (but I’m interested in what it can do)

Page 4: Introduction to Microsoft R Open

• Most widely used data analysis software• Used by 2M+ data scientists, statisticians and analysts

• Most powerful statistical programming language• Flexible, extensible and comprehensive for productivity

• Create beautiful and unique data visualizations• As seen in New York Times, The Economist and FlowingData

• Fills the Data Science talent gap• New graduates prefer R

• Thriving open-source community• Leading edge of analytics research

What is R?

Page 5: Introduction to Microsoft R Open

CRAN: 7000+ add-on packages for R

CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/

Page 6: Introduction to Microsoft R Open

1993 Research project in Auckland, NZ• Ross Ihaka and Robert Gentlemen

1995 Released as open-source software• Generally compatible with the “S” language

1997 R core group formed2003 R Foundation formed in Austria2007 Revolution Analytics founded2014 Revolution R Open launched2015 R Consortium founded2015 Microsoft acquires Revolution Analytics2016 Microsoft R Open 3.2.3 released

A brief history of R

Photo credit: Robert Gentleman

Page 7: Introduction to Microsoft R Open

R: The #1 software for Data Science… and #6 amongst general-purpose programming languages

R Usage GrowthRexer Data Miner Survey, 2007-

2015

Language PopularityIEEE Spectrum Top Programming Languages,

201576% of analytic professionals report using R

36% select R as their primary tool

Page 8: Introduction to Microsoft R Open

200 Local R User Groups Worldwide

Find a user group near you: msdsug.microsoft.com

Page 9: Introduction to Microsoft R Open

Applications of R

Page 10: Introduction to Microsoft R Open

Rapid development

New York Times, June 25 2009(3 hours after Michael Jackson’s death)

Page 12: Introduction to Microsoft R Open

Facebook• Exploratory Data Anal

ysis• Experimental Analysis

“Generally, we use R to move fast when we get a new data set. With R, we don’t need to develop custom tools or write a bunch of code. Instead, we can just go about cleaning and exploring the data.” — Solomon Messing, data scientist at Facebook

Page 13: Introduction to Microsoft R Open

Housing

• Crime mapping

“The core innovation that Zillow offers are its advanced statistical predictive products, including the Zestimate®, the Rent Zestimate and the ZHVI® family of real estate indexes. By using R in production as well as research, Zillow maximizes flexibility and minimizes the latency in rolling out updates and new products.”• Statistical forecasting

Page 14: Introduction to Microsoft R Open

The Azure Cloud

Operational Announced

Central USIowa

West USCalifornia

North EuropeIreland

East USVirginia

East US 2Virginia

US GovVirginia

North Central USIllinois

US GovIowa

South Central USTexas

Brazil SouthSao Paulo

West EuropeNetherlands

China North *Beijing

China South *Shanghai

Japan EastSaitama

Japan WestOsakaIndia West

TBDIndia East

TBD

East AsiaHong Kong

SE AsiaSingapore

Australia WestMelbourne

Australia EastSydney

* Operated by 21Vianet

Page 15: Introduction to Microsoft R Open

• Capacity Planning• Forecasting hardware purchase requirements (forecast package)• Also RAM requirements for Microsoft IT

• System monitoring & alerting• Understanding user behavior (how users configure monitoring

platform)• Visualizing infrastructure utilization data• Abnormal login detection• Custom R packages to analyze monitoring data (time series

anomaly detection)

Microsoft Azure uses R for Reliability

Page 16: Introduction to Microsoft R Open
Page 17: Introduction to Microsoft R Open

• Enhanced Open Source R distribution• 100% compatible with all R-related

software• Faster performance with multi-

threading• CRAN “Time Machine” for

reproducibility• Available for Windows, Mac, and Linux• Free and Open Source

Download from mran.microsoft.com

Microsoft R Open

Page 18: Introduction to Microsoft R Open

• Intel MKL replaces standard BLAS/LAPACK algorithms • Download and install “MKL” from MRAN• Windows and Linux platforms

• High-performance algorithms• Pipelined operations optimized for Intel

• Sequential Parallel• Uses as many threads as there are available

cores• Control with:setMKLthreads(<value>)

• No need to change any R code

MRO: Multi-threaded performance

Benchmarks details at MRAN

R

MRO

MRO

Page 19: Introduction to Microsoft R Open

Reproducibility : share and validateAcademic / Research• Verify results• Advance Research

Business• Production code• Reliability• Reusability• Collaboration• Regulation www.nytimes.com/2011/07/08/health/research/08genes.html

http://arxiv.org/pdf/1010.1092.pdf

Page 20: Introduction to Microsoft R Open

A Reproducibility Problem

Adapted from http://xkcd.com/234/ CC BY-NC 2.5

Page 21: Introduction to Microsoft R Open

Package dependency explosionR script file using 6 most popular packages

Any updated package = potential reproducibility error!http://blog.revolutionanalytics.com/2014/10/explore-r-package-connections-at-mran.html

Page 22: Introduction to Microsoft R Open

MRAN takes a snapshot of all 7,500+ packages every day

CRAN Time Machine

Page 23: Introduction to Microsoft R Open

Add 2 lines to the top of your R script:library(checkpoint)checkpoint("2015-01-28")

• Downloads all required package version as of January 28, 2015• Easy for collaborators to reproduce your results• Easy to use different package versions with different projects

Access snapshots with “checkpoint”

(Any date after Sep 17, 2015)

Page 24: Introduction to Microsoft R Open

Poll 2:If you’re an R user, have you tried Microsoft R Open (or Revolution R Open)? - I’ve never tried Microsoft R Open - I’ve tried Microsoft R Open- I primarily use Microsoft R Open - I don’t use R

Page 25: Introduction to Microsoft R Open

Microsoft R Open Demo:Basic RReproducibility with RLearn R online at:www.datacamp.com/courses/free-introduction-to-r

Page 26: Introduction to Microsoft R Open

Use Microsoft R Open with… Microsoft R Server Big-data analytics and distributed computing on

Linux, Hadoop and Teradata

SQL Server 2016 Big-data analytics integrated with SQL Server database (coming soon)

PowerBI Computations and charts from R scripts in dashboards

Azure ML Studio R Scripts in cloud-based Experiment workflows

Visual Studio R Tools for Visual Studio: integrated development environment for R (coming soon)

HDInsights R integrated with cloud-based Hadoop clusters

Cortana Analytics Cloud-based R APIs and Virtual Machines

Page 27: Introduction to Microsoft R Open

Upcoming Microsoft R Server WebinarsThursday,

February 4Using Microsoft R Server to Address Scalability Issues in R

Thursday, February

11Data Mining with Microsoft R Server

Thursday, February

18Best Practices for using Microsoft R Server with Hadoop

Thursday, February

25Using Microsoft R Server to Operationalize your Analytics

Register: info.microsoft.com/Microsoft-R-Webinars.html

Page 28: Introduction to Microsoft R Open

• R is the leading language for data science today• R is used for all kinds of advanced analytics

applications• Microsoft R Open is 100% compatible with R,

and offers performance and reproducibility benefits

• Microsoft R Open is integrated with SQL Server, PowerBI, and more.

• Download from mran.microsoft.com

Any Questions?

Wrapping Up

Page 29: Introduction to Microsoft R Open

© Copyright Microsoft Corporation. All rights reserved.

Page 30: Introduction to Microsoft R Open

Bonus Slides

Page 31: Introduction to Microsoft R Open

Transformational Trends

cloud computing

2011 2016 5x increase

emerging data science talent

Universities filling 300,000 US talent gap

90% of the data in the world today has been created in the last two years alone

data explosion

opensourcee.g. R and Python

Page 32: Introduction to Microsoft R Open

Working with the R FoundationSupporting the R user communityContinuing the growth of the R ProjectLinux Foundation collaborative projectNon-profit trade organization