29
Effective R: Tools tips and tricks for being a more effective data scientist 102 Wurster Hall UC Berkeley 21 October 2014

Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Effective R: Tools tips and tricks for

being a more effective data scientist

102 Wurster Hall

UC Berkeley

21 October 2014

Page 2: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Polls

How many of you are students?

How many have

> 1 year using R?

> 3 years?

> 5 years?

How many use R as your principal data.science tool?

How many use Python

Julia

SAS or SPSS

Spark/Scala

Java

Ever spend too much time debating which technology fits?

11/4/2014 2 KNOW YOUR DATA

Page 3: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Given a vector of numbers (x),

write a function (f) that returns a vector of numbers containing the product of every other number excluding the current index.

Example:> x <- c( 1, 5, 2, 8 )

> f(x)

[1] 80 16 40 10

# 5*2*8, 1*2*8, 1*5*8, 1*2*5

11/4/2014 3 KNOW YOUR DATA

Page 4: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Decision Patterns

Founded 2010

Bring together complementary skills for managing data:

Acquisition * Organization* Storage Access * Utilization

Our Model

Service Consulting

Accept No VC funding

Use consulting margins from to build niche products

Our Customers

Financial Services, Retail, Entertainment, Food, Communications, Defense, Environmental.

11/4/2014 4 KNOW YOUR DATA

Page 5: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

We get to work on a

• variety of problems,

• with a variety of

technologies

• in a variety of fields

11/4/2014 5 KNOW YOUR DATA

Page 6: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

We have to work on a

• variety of problems,

• with a variety of

technologies

• in a variety of fields

11/4/2014 6 KNOW YOUR DATA

Page 7: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

DATA SCIENTIST OUTLOOK

11/4/2014 7 KNOW YOUR DATA

Page 8: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

11/4/2014 8 KNOW YOUR DATA

Source: http://venturebeat.com/2013/11/11/data-scientists-needed/

Page 9: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

11/4/2014 9 KNOW YOUR DATA

Page 10: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

11/4/2014 10 KNOW YOUR DATA

Page 11: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

11/4/2014 11 KNOW YOUR DATA

Page 12: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

COMPETITION

Much of work will not be done

in traditional worker

11/4/2014 12 KNOW YOUR DATA

Page 13: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

INNOVATION

Spoils go to those who make products

from repeatable processes

11/4/2014 13 KNOW YOUR DATA

Google Prediction API

Page 14: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

The price for analytics is falling …

11/4/2014 14 KNOW YOUR DATA

Page 15: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Paradigm typed, interpreted, OO typed, int., functional, vectorized

Popularity (Tiobe) 8th Rank, -0.77% 15th Rank, +0.97%

Packages

PyPI50,321 Packages

35+ updates / day

CRAN 5,975 package

15 updates/day

351,510 70,652

Development Tools Spyder, I[P], Eclipse Rstudio, Eclipse

Data Packages

Major ML packages

Page 16: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Given a vector of numbers (x)

write a function (f) that returns a vector of numbers containing the product of every other integer excluding the current index.

Example:> x <- c( 1, 5, 2, 8 )

> f(x)

[1] 80 16 40 10

# 5*2*8, 1*2*8, 1*5*8, 1*2*5

Solution:

f <- function(x) prod(x) / x

11/4/2014 16 KNOW YOUR DATA

Page 17: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

Learning CurveEasier esp. if coming from OO background

Steeper.More, dedicated functions

Code Maintainability

Better package system, fewer name clashes

Better documentationGenerally less code req’d

PerformanceHigher, extensible through Cython, C, C++

Rcpp

Code expressiveness

Hack to extend operatorsLazy evaluation

%x% syntax used widelyNon-standard evaluation

Dedicated Web Frameworks

Translucent Shiny

Feature completeness

Rmarkdown, Reproducible Research, ProjectTempate

Vendor Entrenchment

Windows Azure, Oracle, MicroStratety, Birst, Tableau

11/4/2014 17 KNOW YOUR DATA

Page 18: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

BREAKDOWN OF CODE TO

11/4/2014 18 KNOW YOUR DATA

Data

Management

Statistical

Operations

Visualization

Presentation

Formating

Delivery

Other (Misc)

Page 19: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

What about …

11/4/2014 19 KNOW YOUR DATA

Page 20: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

SECRETS TO LEET CODING

11/4/2014 20 KNOW YOUR DATA

Adopt standards: Cf. Python’s PEP-8 Naming and Formating

PEP-257 Documentation

PEP-20 Readability

We do not follow Google’s coding convention

Use version control:

Github, Bitbucket, Gitlab

Best GUI: Atlassian Sourcetree

Use Agile Methods

Track issues: JIRA, Github, Gitlab

Commit early and often.

Good PM is worth every penny.

Page 21: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

SECRETS TO LEET CODING 2

11/4/2014 21 KNOW YOUR DATA

Follow Established Development Patterns

Goal Description R Packages

Ad hoc analysis Create a process ProjectTemplate, Rmarkdown

Package Development

Create a package Rstudio, Roxygen2, devtools

Application : Interactive

Web application Javascript,Shiny, OpenCPUJavascript

Application : Automated

Code to be scheduled or called as an event

Rscript (R –e), optigrab, crontab.

Creativity is generally a bad thing

Page 22: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

R’S NOBELS

11/4/2014 22 KNOW YOUR DATA

Page 23: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

DATA.TABLES

Munging and data management

11/4/2014 23 KNOW YOUR DATA

https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping

Page 24: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

MAGRITTR

Code readability, interactive programming

11/4/2014 24 KNOW YOUR DATA

Page 25: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

FOREACH, ITERTOOLS

Scaling-out

11/4/2014 25 KNOW YOUR DATA

Page 26: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

RCPP, GMATRIX

Increasing Performance

11/4/2014 26 KNOW YOUR DATA

Page 27: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

CARET (CLASSIFICATION AND REGRESSION TRAINING)

For Machine Learning

11/4/2014 27 KNOW YOUR DATA

Page 28: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

GGPLOT2

GGVIS

Visualization

11/4/2014 28 KNOW YOUR DATA

Page 29: Effective R: Tools tips and tricks for being a more ...files.meetup.com/3182622/r-enthusiasts-20141021.pdf · 10/21/2014  · Rcpp Code expressiveness Hack to extend operators Lazy

APPENDIX

11/4/2014 29 KNOW YOUR DATA