1
By Danny Blaker data analyst at www.leanstartup.chat blogger www.dannyblaker.com Basic Operators + add - subtract * multiply / divide ^ power of %% Modulo & and Variables x <- assigns a variable to “x” x print a variable to console x + y add 2 variables together z <- x + y add 2 variables together and store them in a new variable (z). Classes 3 main classes: 12 Numeric (number) “hello” String TRUE Logical (True or False) Class(x) returns class of x unclass(x) returns argument without class Vectors Libraries & Packages HELP ?my_function type “?” before any function in the console to access documentation args(function) List arguments in a function typeof() Shows vector type length() Shows vector length range() Shows vector range print(x) prints x options() set global options x <- 23 x Example == equals != does not equal > greater than < less than >= greater than or equal to <= less than or equal to | or x <- c(23,54) y <- c(12,14) z <- x + y c() Example x <- c(1,2,3) creates a vector “x” containing numbers 1, 2, and 3 c() combines objects / elements vectors can be added, subtracted, multiplied and divided. Result can be stored in a new variable. names() sets the names for each element in a vector [] selects an element of a vector vector_1 > vector_2 checks if each element in vector_1 is greater than the corresponding element in vector_2 [] Example: x[c(3:6)] selects elements 3 to 6 of vector Data.frames Matrices install.package(”package_name”) installs any package you specify library(”package_name”) loads a package into worspace args(function) List arguments in a function search() search packages currently attached 3 main classes: matrix(1:9, byrow = TRUE, nrow = 3) creates a matrix containing no.s 1-9 accross 3 rows colnames() assigns column names rownames() assigns row names dimnames = list(c(”row name”), c(”column names”)) is another way to assign column and row names rbind() combines data by rows cbind() combines data by columns colSums() sum of matrix columns rowSums() sum of matrix rows x[1,1] selects row 1, column 1 reference from matix “x” entire matrices can be multiplied or divided like a regular vector RGH the data pirate’s R-cheatsheet names() Example names(x) <- y Factors Factors are weighted / ordered observations or variables factor(x) makes x a factor factor(x, ordered = FALSE) makes x a non-weighted factor factor(x, ordered = TRUE, levels = c("1st", "2nd", "3rd")) makes x into a factor with ordered levels: 1st, 2nd and 3rd levels(x) <- c("1", "2") assigns factor x with levels “1” and “2” str(x) structure of variable “x” or data set dim(x) Quickly shows number of observations & variables names() Shows top level names of list or dataset summary(x) Instant summary of “x” head(x) Shows start of dataset “x” tail(x) Shows end of dataset “x” str(head(x) Shows structure of start of dataset “x” str(tail(x) Shows structure of end of dataset “x” subset(x, subset = column_1 > 1) creates a subset of data frame “x” with all entries where “column_1” is greater than 1 order(x) sorts dataset x list() creates a list $ selects a column of a dataframe append(x,y) appends vectors x and y sort(x, decreasing = FALSE, ...) sorts x data.frame() creates a data frame items <- c(“parrot”,”sword”) islands <- c(”skull island”,”treasure caverns”) pirate_brochure <- data.frame(items, islands) creates a 2x2 data frame stored in “pirate_brochure” data.frame Example pirate_brochure[] selects elements of data frame “pirate_brochure” pirate_brochure[1, ] selects row 1 pirate_brochure[ ,1] selects column 1 pirate_brochure[1,1] selects observation row 1 col 1 pirate_brochure[1:2, “items”] selects 1st and second observations in column “items” data frame element selection Example df$names selects the “names” column of dataframe “df” dbConnect example dbConnect(RMySQL::MySQL(), dbname = "db", host = "db.amazonaws.com", port = 0000, user = "test", password = "1234") Basic Queries mean() average sum() sum abs() absolute value sd() standard deviation sqrt() square root norm() norm of matrix median() median value dnorm() density normal distribution pnorm() distribution fucntion for normal distribution qnorm() quantile normal distribution rnorm() random normal distribution strsplit() split strings identical() check if value is identical cat() combines and prints paste0() converts to strings and concatenates lm() fits linear model split() divide into groups and reassemble If statement if (condition1) { expr1 } else if (condition2) { expr2 } else if (condition3) { expr3 } else { expr4 } While loop while (condition) { expr } For loop for(var in seq) { expr } break exits loop next skips specified loop iteration Function syntax my_fun <- function(arg1, arg2) { body } Functions return(x) returns x is.na() counts how many elements are missing warning(..., call. = FALSE) warning message stop(..., call. = TRUE stops execution message() diagnostic message any() is atleast one value true lapply(X, FUN, ...) iterates over x with a function and returns a list sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) iterates over x with a function and returns a vector vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) iterates over x with a function and returns a specified output replicate(n, expr, simplify = "array") sapply for repeated evaluation mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) multivariate version of sapply library example library(ggplot2) qplot(pirates$swords, pirates$parrots) creates a plot with swords & parrots columns of “pirates” dataframe grep() returns a vector of indices of the character strings that contains a specificpattern. grepl() returns TRUE when a pattern is found in acorresponding character string Regular Expressions sub() search for and replace (first only) gsub() search for and replace (ALL) ^ match content at start of string $ match content at start of string .* match any character zero or more times \\ escapes a character (eg. “.”) Regex syntax example pirates <- c("[email protected]", "[email protected]") replace all “pirateparrot”s with “piratesword”s sub("@.*\\.com$","@pirateswo rd.com", pirates) Dates & Times Sys.Date() current date Sys.time() current time %Y 4-digit year (2016) %y 2-digit year (16) %m 2-digit month (01) %d 2-digit day of the month (20) %A weekday (Monday) %a: abbreviated weekday (Mon) %B: month (September) %b: abbreviated month (Sep) Graphs plot(x, y, ...) basic scatter plot hist() histogram boxplot() box plot density() kernel density plot dotchart() dot plot barplot() bar plot lines() line chart pie() pie chart list1[[x]] returns element x in list1 list1[[x]][[1]] returns the first element inside the element called x in list1 list1[[x]][[1]][[2]] returns the second element inside the first element inside the element called x in list1 List Subsetting map(.x, .f, ...) apply a function to each element of a vector pmap(.l, .f, ...) map over multiple inputs %>% pipes: “x %>% f(y)” is the same as “f(x, y)” purrr Importing Data filter(.data, ...) filter db rows by matching condition (requires dplyr package) Utils read.table() read.csv() read.delim() readr read_delim() read_csv() read_tsv() skip skips rows from beginning n_max maximum rows to import fread() fast import (requires dtplyr package) excel_sheets() prints sheet names read_excel() import data from spreadsheet read.xls() import from .xls (requires gdata package) loadWorkbook() import workbook (requires XLConnect package) getSheets() read sheets (XLConnect) readWorksheet() import sheets (XLConnect) read_sas(”file_name.sas7bdat) imports SAS file (requires haven package) DBI Base package RMySQL MySQL ROracle Oracle RPostgresSQL PostgresSQL Importing Data PACKAGES dbConnect connects to database

the data pirate’s R-cheatsheet - Squarespacecheat+sheet.pdfthe data pirate’s R-cheatsheet names() Example names(x)

Embed Size (px)

Citation preview

Page 1: the data pirate’s R-cheatsheet - Squarespacecheat+sheet.pdfthe data pirate’s R-cheatsheet names() Example names(x)

By Danny Blakerdata analyst at www.leanstartup.chat

blogger www.dannyblaker.com

Basic Operators+ add- subtract* multiply/ divide^ power of%% Modulo& and

Variablesx <- assigns a variable to “x”x print a variable to consolex + y add 2 variables togetherz <- x + y add 2 variables together and store them in a new variable (z).

Classes3 main classes:12 Numeric (number)“hello” StringTRUE Logical (True or False)Class(x) returns class of xunclass(x) returns argument without class

Vectors

Libraries & Packages

HELP

?my_function type “?” before any function in the console to access documentationargs(function) List arguments in a functiontypeof() Shows vector typelength() Shows vector lengthrange() Shows vector rangeprint(x) prints xoptions() set global options

x <- 23x

Example== equals!= does not equal> greater than< less than>= greater than or equal to<= less than or equal to| or

x <- c(23,54)y <- c(12,14)z <- x + y

c() Example

x <- c(1,2,3) creates a vector “x” containing numbers 1, 2, and 3c() combines objects / elementsvectors can be added, subtracted, multiplied and divided. Result can be stored in a new variable.names() sets the names for each element in a vector[] selects an element of a vectorvector_1 > vector_2 checks if each element in vector_1 is greater than the corresponding element in vector_2

[] Example: x[c(3:6)]selects elements 3 to 6 of vector

Data.frames

Matrices

install.package(”package_name”) installs any package you specifylibrary(”package_name”) loads a package into worspaceargs(function) List arguments in a functionsearch() search packages currently attached

3 main classes:matrix(1:9, byrow = TRUE, nrow = 3) creates a matrix containing no.s 1-9 accross 3 rowscolnames() assigns column namesrownames() assigns row namesdimnames = list(c(”row name”), c(”column names”)) is another way to assign column and row namesrbind() combines data by rowscbind() combines data by columnscolSums() sum of matrix columnsrowSums() sum of matrix rowsx[1,1] selects row 1, column 1 reference from matix “x”entire matrices can be multiplied or divided like a regular vector

RGHthe data pirate’sR-cheatsheet

names() Examplenames(x) <- y

FactorsFactors are weighted / ordered observations or variablesfactor(x) makes x a factorfactor(x, ordered = FALSE) makes x a non-weighted factorfactor(x, ordered = TRUE, levels = c("1st", "2nd", "3rd")) makes x into a factor with ordered levels: 1st, 2nd and 3rdlevels(x) <- c("1", "2") assigns factor x with levels “1” and “2”

str(x) structure of variable “x” or data setdim(x) Quickly shows number of observations & variablesnames() Shows top level names of list or datasetsummary(x) Instant summary of “x”head(x) Shows start of dataset “x”tail(x) Shows end of dataset “x”str(head(x) Shows structure of start of dataset “x”str(tail(x) Shows structure of end of dataset “x”subset(x, subset = column_1 > 1) creates a subset of data frame “x” with all entries where “column_1” is greater than 1order(x) sorts dataset xlist() creates a list$ selects a column of a dataframeappend(x,y) appends vectors x and ysort(x, decreasing = FALSE, ...) sorts x

data.frame() creates a data frame

items <- c(“parrot”,”sword”)islands <- c(”skull island”,”treasure caverns”)pirate_brochure <- data.frame(items, islands)creates a 2x2 data frame stored in “pirate_brochure”

data.frame Example

pirate_brochure[] selects elements of data frame “pirate_brochure”

pirate_brochure[1, ] selects row 1pirate_brochure[ ,1] selects column 1pirate_brochure[1,1] selects observation row 1 col 1pirate_brochure[1:2, “items”] selects 1st and second observations in column “items”

data frame element selection Exampledf$names selects the “names” column of dataframe “df”

dbConnect exampledbConnect(RMySQL::MySQL(), dbname = "db", host = "db.amazonaws.com", port = 0000, user = "test", password = "1234")

Basic Queriesmean() averagesum() sumabs() absolute valuesd() standard deviationsqrt() square rootnorm() norm of matrixmedian() median valuednorm() density normal distributionpnorm() distribution fucntion for normal distributionqnorm() quantile normal distributionrnorm() random normal distributionstrsplit() split stringsidentical() check if value is identicalcat() combines and printspaste0() converts to strings and concatenateslm() fits linear modelsplit() divide into groups and reassemble

If statementif (condition1) { expr1} else if (condition2) { expr2} else if (condition3) { expr3} else { expr4}

While loop while (condition) { expr}

For loop for(var in seq) { expr}

break exits loopnext skips specified loop iteration

Function syntaxmy_fun <- function(arg1, arg2) { body}

Functions

return(x) returns xis.na() counts how many elements are missingwarning(..., call. = FALSE) warning messagestop(..., call. = TRUE stops executionmessage() diagnostic messageany() is atleast one value truelapply(X, FUN, ...) iterates over x with a function and returns a listsapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) iterates over x with a function and returns a vectorvapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE) iterates over x with a function and returns aspecified outputreplicate(n, expr, simplify = "array") sapply for repeated evaluationmapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) multivariate version of sapply

library examplelibrary(ggplot2)qplot(pirates$swords, pirates$parrots)creates a plot with swords & parrots columns of “pirates” dataframe

grep() returns a vector of indices of the character strings that contains a specificpattern.grepl() returns TRUE when a pattern is found in acorresponding character string

Regular Expressions

sub() search for and replace (first only)gsub() search for and replace (ALL)^ match content at start of string$ match content at start of string.* match any character zero or more times\\ escapes a character (eg. “.”)

Regex syntax examplepirates <- c("[email protected]", "[email protected]")replace all “pirateparrot”s with “piratesword”ssub("@.*\\.com$","@piratesword.com", pirates)

Dates & TimesSys.Date() current dateSys.time() current time%Y 4-digit year (2016)%y 2-digit year (16)%m 2-digit month (01)%d 2-digit day of the month (20)%A weekday (Monday)

%a: abbreviated weekday (Mon)%B: month (September)%b: abbreviated month (Sep)

Graphsplot(x, y, ...) basic scatter plothist() histogramboxplot() box plotdensity() kernel density plotdotchart() dot plotbarplot() bar plotlines() line chart pie() pie chart

list1[[x]] returns element x in list1 list1[[x]][[1]] returns the first element inside the element called x in list1list1[[x]][[1]][[2]] returns the second element inside the first element inside the element called x in list1

List Subsetting

map(.x, .f, ...) apply a function to each element of a vectorpmap(.l, .f, ...) map over multiple inputs %>% pipes: “x %>% f(y)” is the same as “f(x, y)”

purrr

Importing Data

filter(.data, ...) filter db rows by matching condition (requires dplyr package)

Utils

read.table() read.csv()

read.delim()

readr

read_delim() read_csv()read_tsv()

skip skips rows from beginningn_max maximum rows to importfread() fast import (requires dtplyr package)

excel_sheets() prints sheet namesread_excel() import data from spreadsheet read.xls() import from .xls (requires gdata package)loadWorkbook() import workbook (requires XLConnect package)getSheets() read sheets (XLConnect)readWorksheet() import sheets (XLConnect)

read_sas(”file_name.sas7bdat) imports SAS file (requires haven package)

DBI Base packageRMySQL MySQLROracle OracleRPostgresSQL PostgresSQL

Importing Data PACKAGES dbConnect connects to database