10
Introduction to R for Data Science Lecturers dipl. ing Branko Kovač Data Analyst at CUBE/Data Science Mentor at Springboard Data Science zajednica Srbije [email protected] dr Goran S. Milovanovi ć Data Scientist at DiploFoundation Data Science zajednica Srbije [email protected] [email protected]

Introduction to R for Data Science :: Session 3

Embed Size (px)

Citation preview

Page 1: Introduction to R for Data Science :: Session 3

Introduction to R for Data Science

Lecturers

dipl. ing Branko Kovač

Data Analyst at CUBE/Data Science Mentor

at Springboard

Data Science zajednica Srbije

[email protected]

dr Goran S. Milovanović

Data Scientist at DiploFoundation

Data Science zajednica Srbije

[email protected]

[email protected]

Page 2: Introduction to R for Data Science :: Session 3

Lists in R

• Lists can contain elements (objects) of various types/classes

• Lists can be recursive: a list of lists

• In R we use lists a lot; however, computing over lists is seldom the most efficient way

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

# It's time to speak about lists num_vct <- c(2:5) # just another num vector chr_vct <- c("data", "science") # char vector data_frame <- data.frame(x = c("a", "b", "c", "d"), y = c(1:4)) # simple df lista <- list(data_frame, num_vct, chr_vct) # and this is a list lista # this is our list

Page 3: Introduction to R for Data Science :: Session 3

Lists in R

• Subsetting lists

• Think of an element (a node) of a list as a “container” which is always a list itself

• Subsetting with [[ ]] and [ ] – careful!

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

str(lista) # about a list length(lista) as.list(chr_vct) # another way to create a list # Lists manipulation names(lista) <- c("data", "numbers", "words") lista[3] # 3rd element? lista[[3]] # 3rd element? is.list(lista[3]) # is this a list? is.list(lista[[3]]) # and this? class(lista[[3]]) # also a list? Don’t be so sure!

Page 4: Introduction to R for Data Science :: Session 3

Lists in R

• More subsetting

• Adding and removing a node

• unlist()

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

lista$words # we can also extract an element this way lista[["words"]] # or even like this lista[["words"]][1] # digging even deeper lista$new_elem <- c(TRUE, FALSE, FALSE, TRUE) # add new element length(lista) # now list has 4 elements lista$new_elem <- NULL # but we can remove it easily new_vect <- unlist(lista) # creating a vector from list

Page 5: Introduction to R for Data Science :: Session 3

Functions in R

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016 # Functions # (w. less formalism but tips & tricks added)

# elementary: a definition fun <- function(x) x+10; fun(5) # taking two arguments fun2 <- function(x,y) x+y; fun2(3,4) # using "{" and "}" to enclose multiple R # expressions in the function body fun <- function(x,y) { a <- sum(x); b <- sum(y); a-b}

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

r <- c(5,4,3); q <- c(1,1,1); fun(r,q) fun(c(5,4,3),c(1,1,1)) # NOTE: "{" and "}" are generally used in R # to mark the beginning and the end of # block # a function is a function: is.function(fun); is.function(log); # log is built-in

Page 6: Introduction to R for Data Science :: Session 3

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

# Functional programming ("Everything is a function...") "^"(2,2) "^"(2,3) # magic! - how do you do that? 2^2 2^3 # the difference between "operators" and "functions" in R: none # Everything is a function: "+"(2,2) # Four? 2+2 # yeah, right - Oh but I love this "-"("+"(3,5),2) "&"(">"(2,2),T) "&"(">"(3,2),T) # punishment: write all your lab code for this week in this fashion...

Functions in R

• Functional programming

Page 7: Introduction to R for Data Science :: Session 3

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

# Step 1: here's a list: aList <- list(c(1,2,3), c(4,5,6), c(7,8,9), c(10,11,12)) # Step 2: I want to apply the following function: myFun <- function(x) {x[1]+x[2]-x[3]} # to all elements of the aList list, and get the result as a list again. # Here it is: res <- lapply(aList, function(x) { x[1]+x[2]-x[3]}) unlist(res) # to get a vector

Lists and Functions in R

• Two things that come handy: lapply() and apply()

Page 8: Introduction to R for Data Science :: Session 3

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science

# SESSION 3 :: 12 May, 2016

# Now say I've got a matrix myMat <- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3) # now, I want the sums of all rows: rsMyMat <- apply(myMat, 1, function(x) {sum(x)}) rsMyMat is.list(rsMyMat) # just beatiful # for columns: csMyMat <- apply(myMat, 2, function(x) {sum(x)})

Lists and Functions in R

• Two things that come handy: lapply() and apply()

Page 9: Introduction to R for Data Science :: Session 3

Intro to R for Data Science

Session 2: Lists & Functions

# Introduction to R for Data Science # SESSION 3 :: 12 May, 2016 # with existings functions, such as sum(), this will do: rsMyMat1 <- apply(myMat, 1, sum) rsMyMat1 csMyMat1 <- apply(myMat, 2, sum) csMyMat1 # try also… rowSums(myMat) colSums(myMat)

Lists and Functions in R

• Two things that come handy: lapply() and apply()

Page 10: Introduction to R for Data Science :: Session 3