28
R Programming Sakthi Dasan Sekar http://shakthydoss.com 1

3 R Tutorial Data Structure

Embed Size (px)

Citation preview

Page 1: 3 R Tutorial Data Structure

R ProgrammingSakthi Dasan Sekar

http://shakthydoss.com 1

Page 2: 3 R Tutorial Data Structure

Data structures

a) Vector

b) Matrix

c) Array

d) Data frame

e) List

http://shakthydoss.com 2

Page 3: 3 R Tutorial Data Structure

Data structure

Vectors are one-dimensional arrays

a <- c(1, 2, 5, 3, 6, -2, 4)

b <- c("one", "two", "three")

c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)

a is numeric vector,

b is a character vector, and

c is a logical vector

http://shakthydoss.com 3

Page 4: 3 R Tutorial Data Structure

Data structure

Scalars are one-element vectors.

f <- 3

g <- "US"

h <- TRUE.

They’re used to hold constants.

http://shakthydoss.com 4

Page 5: 3 R Tutorial Data Structure

Data structure

The colon operator :

a <- c(1:5)

is equivalent to

a <- c(1,2, 3, 4, 5)

http://shakthydoss.com 5

Page 6: 3 R Tutorial Data Structure

Data structure

Vector

You can refer to elements of a vector using a numeric vector of positions within brackets.

Example

vec <- c(“a”, “b”, “c”, “d”, “e”, ”f”)

vec[1] # will return the first element in the vector

vec[c(2,4)] # will return the 2nd and 4th element in the vector.

http://shakthydoss.com 6

Page 7: 3 R Tutorial Data Structure

Data structure

Matrices

Matrix are two-dimensional data structure in R. Elements in matrix should have same mode (numeric, character, or logical).Matrices are created with the matrix() function.

vector <- c(1,2,3,4) foo <- matrix(vector, nrow=2, ncol=2)

http://shakthydoss.com 7

Page 8: 3 R Tutorial Data Structure

Data structure

Matrices byrow (optional parameter)

byrow=TRUE, matrix elements are filled by row wise.

byrow=FALSE, matrix elements are filled by column wise.

foo <- matrix(vector, nrow=2, ncol=2, byrow = TRUE)

foo <- matrix(vector, nrow=2, ncol=2, byrow = FALSE)

http://shakthydoss.com 8

Page 9: 3 R Tutorial Data Structure

Data structure

Matrix element can be accessed by subscript and brackets

Example

mat <- matrix(c(1:4), nrow=2,ncol = 2)

mat[1,] # returns first row in the matrix. mat[2,] # returns second row in the matrix.

mat[,1] # returns first column in the matrix. mat[,2] # returns second column in the matrix.

mat[1,2] # return element at first row of second column.

http://shakthydoss.com 9

Page 10: 3 R Tutorial Data Structure

Data structure

Array

Arrays are similar to matrices but can have more than two dimensions

Arrays are created with the array() function.

array(vector, dimensions, dimnames)

a <- matrix(c(1,1,1,1) , 2, 2)

b <- matrix(c(2,2,2,2) , 2, 2)

foo <- array(c(a,b), c(2,2,2))

http://shakthydoss.com 10

Page 11: 3 R Tutorial Data Structure

Data structure

Array

array elements can be accessed in the same way a matrices.

foo[1,,] # returns all elements in first dimension

foo[2,,] # returns all element in second dimension

foo[2,1,] # returns only first row element in second dimension

http://shakthydoss.com 11

Page 12: 3 R Tutorial Data Structure

Data structure

Data frame Data frames are the most commonly used data structure in R.

Data frame is more like general matrix but its columns can contain different modes of data (numeric, character, etc.)

A data frame is created with the data.frame() function

data.frame(col1, col2, col3,..)

name <- c( “joe” , “jhon” , “Nancy” )

sex <- c(“M”, “M”, “F”)

age <- c(27,26,26)

foo <- data.frame(name,sex,age)

http://shakthydoss.com 12

Page 13: 3 R Tutorial Data Structure

Data structure

Data frame

Accessing data frame elements can be straight forward. Element can be accessed by column names.

Example

foo$name # retruns name vector in the data frame

foo$age # retuns age vector in the data frame

foo$age[2] # retuns second element of age vector in the data frame

http://shakthydoss.com 13

Page 14: 3 R Tutorial Data Structure

Data structure

FactorsCategorical variables in R are called factors.

Status (poor, improved, excellent) and Gender (Male, Female) are good example of an categorical variables.

Factor are created using factor() function.

gender <- c(“Male", “Female“, “Female”, “Male”)

status <- c(“Poor”, “Improved” “Excellent”, “Poor” , “Excellent”)

factor_gender <- factor(gender) # factor_genter has two levels called Male and Female

factor_status <- factor(status) # factor_status has three levels called Poor, Improved and Excellent.

http://shakthydoss.com 14

Page 15: 3 R Tutorial Data Structure

Data structure

ListLists are the most complex data structure in R

List may contain a combination of vectors, matrices, data frames, and even other lists.

You create a list using the list() function

vec <- c(1,2,3,4)

mat <- matrix(vec,2,2)

foo <- list(vec, mat)

http://shakthydoss.com 15

Page 16: 3 R Tutorial Data Structure

Data Import/Export

Import Excel File

Quite frequently, the sample data is in Excel format, and needs to be imported into R prior to use.

library(gdata) # load gdata package

help(read.xls) # documentation

mydata = read.xls("mydata.xls") # read from first sheet

http://shakthydoss.com 16

Page 17: 3 R Tutorial Data Structure

Data Import/Export

Import Excel File

Alternate package XLConnect

library(XLConnect)

wk = loadWorkbook("mydata.xls")

df = readWorksheet(wk, sheet="Sheet1")

http://shakthydoss.com 17

Page 18: 3 R Tutorial Data Structure

Data Import/Export

Import Minitab File

If the data file is in Minitab Portable Worksheet format, it can be opened with the function read.mtp from the foreign package. It returns a list of components in the Minitab worksheet.

library(foreign) # load the foreign package

help(read.mtp) # documentation

mydata = read.mtp("mydata.mtp") # read from .mtp file

http://shakthydoss.com 18

Page 19: 3 R Tutorial Data Structure

Data Import/Export

Import Table File

A data table can resides in a text file. The cells inside the table are separated by blank characters. Here is an example of a table with 4 rows and 3 columns.

100 a1 b1 200 a2 b2 300 a3 b3 400 a4 b4

help(read.table) #documentation mydata = read.table("mydata.txt")

http://shakthydoss.com 19

Page 20: 3 R Tutorial Data Structure

Data Import/Export

Import CSV File

The sample data can also be in comma separated values (CSV) format. Each cell inside such data file is separated by a special character, which usually is a comma.

help(read.csv) #documentation

mydata = read.csv("mydata.csv", sep=",")

http://shakthydoss.com 20

Page 21: 3 R Tutorial Data Structure

Data Import/Export

Export Table filehelp(write.table) #documentation

write.table(mydata, "c:/mydata.txt", sep="\t")

Export Excel file library(xlsx)

help(write.xlsx) #documentation

write.xlsx(mydata, "c:/mydata.xlsx")

http://shakthydoss.com 21

Page 22: 3 R Tutorial Data Structure

Data Import/Export

Export CSV file

help(write.csv)

write.csv(mydate, file = "mydata.csv")

Avoid writing the headers

write.csv(mydata, file = "mydata.csv", row.names=FALSE)

http://shakthydoss.com 22

Page 23: 3 R Tutorial Data Structure

Data Import/Export

Knowledge Check

http://shakthydoss.com 23

Page 24: 3 R Tutorial Data Structure

Data Import/Export

Every individual data value has a data type that tells us what sort of value it is.

A. TRUE

B. FALSE

Answer A

http://shakthydoss.com 24

Page 25: 3 R Tutorial Data Structure

Data Import/Export

What happen when execute the code. vec <- c(1,"hello",TRUE)

A. vec is assigned with multiple values.

B. Nothing happens.

C. ERROE

D. vec has only one value and that is TRUE.

Answer C

http://shakthydoss.com 25

Page 26: 3 R Tutorial Data Structure

Data Import/Export

Which statement is TRUE A. Matrix is a three-dimensional collection of values that all have the same

type.

B. A factor can be used to represent a categorical variable.

C. Vector is a two-dimensional collection of values that can have multiple mode (numeric, character, boolean).

D. At maximum a single data frame can hold only 20GB of data.

Answer B

http://shakthydoss.com 26

Page 27: 3 R Tutorial Data Structure

Data Import/Export

What is most appropriate data structure for the below dataset.

A. Matrix

B. Data frame

C. Array

D. List

Answer B

Name Age Gender

Jhon 24 M

Joe 24 M

Nancy 25 F

http://shakthydoss.com 27

Page 28: 3 R Tutorial Data Structure

Data Import/Export

Function that is used to create array

A. a(vector, dimensions, dimnames)

B. create(vector, dimensions, dimnames)

C. array(vector, dimensions, dimnames)

D. a(vector,dimensions)

Answer C

http://shakthydoss.com 28