Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Resampling Methodswith R
Engin YILDIZTEPE, Ph.D.2017-2018 Fall
E.Yıldıztepe
Reference Books
• R manuals
• Braun W.J., Murdoch D.J., A First Course in StatisticalProgramming with R, Cambridge, 2009.
• Kabacoff, R.I., R in Action, 2011.
• Matloff, N., The Art of R Programming, 2011.
2
2
E.Yıldıztepe
Reference Books
• Efron B., Tibshirani, R.J., “An Introduction to the Bootstrap”, Chapman&Hall, 1993.
• Davison, A. C. & Hinkley, D. V., “Bootstrap Methods and theirApplication”, Cambridge University Press, 1997.
• Zieffler A.S., Harring, R.H., Long J.D., “Comparing GroupsRandomization and Bootstrap Methods Using R”, Wiley, 2011.
• Manly, B.F.J, “Randomization, Bootstrap and Monte Carlo Methods in Biology”, Chapman&Hall, 2007.
• Chihara, L., Hesterberg, T., Mathematical Statistics withResampling and R, 2011.
3
E.Yıldıztepe
Installation of R• R can be downloaded from
http://www.r-project.org/ .• It runs on:
– Microsoft Windows,– A wide variety of UNIX,– MacOS
• R Studiohttp://www.rstudio.com
4
3
E.Yıldıztepe
Getting Help• Online help• Help menu
– Search– Html help
• Manuals on R-Project web site
5
E.Yıldıztepe
Getting Help
> help()> help(mean)>?mean> help.search("mean")>RSiteSearch("mean")> apropos("mean")
>?Syntax>?Arithmetic
6
4
E.Yıldıztepe
Calculating with R• Prompt symbol (>)• R can be used as a calculator:
>2+2>[1] 4
> 2+2*3[1] 8> (2+2)*3[1] 12
7
E.Yıldıztepe
Operators: Arithmetic, Comparison, LogicThey are listed in precedence groups, from highest to lowest.
Operator Function
( , { Function calls and grouping expressions (respectively)[ , [[ Indexing
:: , ::: access variables in a name space$ , @ access named components, access slots
^ Exponentiation (right to left)- Unary minus: sequence operator
%any% Special operators (%%, %/% etc.)* , / Multiply, divide+ , - Add, substract (binary)
< ,> , <= , >= == , !=
Comparison operators (smaller than, bigger than, smaller orequal than, bigger or equal than, equal to, not equal to)
8
5
E.Yıldıztepe
Operators: Arithmetic, Comparison, LogicOperator Function
! Logical negation
& , && And
| , || Or~ As in formulas-> , ->> Rightward assignment= Assignment (right to left)<- ,<<- Assignment (right to left)
? Help (unary and binary)
Within an expression operators of equal precedence are evaluated from left to right.
>3 / 2 * 4
9
E.Yıldıztepe
Some functions%% Gives the reminder (modulus). Ex: 17%%5%/% Gives the integer part of a division. Ex: 17%/%5
seq(a,b) To generate a sequence of integers between two valuesEx: seq(1,10)
seq(1.575, 5.125, by=0.05)abs(x) Gives the absolute value of x. Ex: abs(-5)sqrt(x) Gives the square root value of x.
log, log10, log2
log computes logarithms, by default natural logarithms, log10 computes common (i.e., base 10) logarithms, and log2 computes binary (i.e., base 2) logarithms. The general form log(x, base) computes logarithms with base base.
exp exp computes the exponential functionround rounds the values in its first argument to the specified number of
decimal places (default 0). min, max Returns the maxima and minima of the input valueslength(x) Gets or sets the length of vectorsum(x) Gives sum of the elements in vector x 10
6
E.Yıldıztepe
Assignment
• x<-2 or x=2
• Variable names can be built from letters, digits andthe dot symbol. The limitation,• The name must not start with a digit or a dot
symbol followed by a digit.• Names are CASE-SENSITIVE
(X and x don’t refer to the same variable)
11
E.Yıldıztepe
Vectors• Vectors are variables that can be thought of as
contiguous cells containing data.• Cells are accessed through indexing operations
(square brackets) such as a[1].• R has six basic (‘atomic’) vector types:
logical, integer, real, complex, string (or character), raw
12
7
E.Yıldıztepe
Data is a Vector
• c, seq, and rep that are used to create vectors in various situation.
• The concatenation function c is used to define vectors.> a<-c(1,3,5,6)> a[1] 1 3 5 6• We can also concatenate vectors of more than one element
as in> b <- c(23, 44) > ab<-c(a, b)> ab[1] 1 3 5 6 23 44
13
E.Yıldıztepe
Data is a Vector
• We can make a copy> a1<-a
• Assign the first value to 0> a1[1]<-0
14
8
E.Yıldıztepe
Vectors• seq (“sequence”) is used for equidistant series of numbers> seq(1,10)[1] 1 2 3 4 5 6 7 8 9 10> seq(1,10,2)[1] 1 3 5 7 9> seq(1,2,0.3)[1] 1.0 1.3 1.6 1.9• rep (“replicate”) is used to generate repeated values.> rep(a,2)[1] 1 3 5 6 1 3 5 6> rep(a, each=2)[1] 1 1 3 3 5 5 6 6> rep(a,1:4)[1] 1 3 3 5 5 5 6 6 6 6> rep(1:4,a)[1] 1 2 2 2 3 3 3 3 3 4 4 4 4 4 4 15
E.Yıldıztepe
Exercises
• Arithmetic calculations:› 7 * 4 + 3 › 3 + 7 * 4 › (3 + 7) * 4 › 1:10 › 1:10*3 › 3^1:10
16
9
E.Yıldıztepe
Exercises
• Write out the required line of R code (Use the rep() and seq() functions):a)1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7
7 8 8 8 8
b)1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8
c) 0 -1 0 -1 1 0 1 0 2 1 2 1 3 2 3 2 4 3 4 3 5 4 5 4 6 5 6 5 7 6 7 6
1717
E.Yıldıztepe
Exercises
• Write out the required line of R code:a) "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
"O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
b) 9 27 81 243 729 2187 6561 19683 59049
c) -1 0 1 8 27 64 125 216 343 512 729
• Calculate the mean value of a vector
1818
10
E.Yıldıztepe
Logical vectors
• A logical vector can only takes the TRUE and FALSE values.
> x<-c(TRUE,TRUE,FALSE,TRUE,FALSE)> x[1] TRUE TRUE FALSE TRUE FALSE• You can also use the capital T and F.> x<-c(T,T,F,T,F)
19
E.Yıldıztepe
Logical vectors as subscript
• if a logical vector is used as a subscript vector…> y<-1:10> y[1] 1 2 3 4 5 6 7 8 9 10> x[1] TRUE TRUE FALSE TRUE FALSE
> y[x][1] 1 2 4 6 7 9
20
11
E.Yıldıztepe
Logical vectors Example:
a<-7:16b<-19:10> a[1] 7 8 9 10 11 12 13 14 15 16
> b[1] 19 18 17 16 15 14 13 12 11 10
> s<-a<b> s[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
21
E.Yıldıztepe
using negative sign (-) as subscript
>x<- 5:20>x[1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20> x[-1][1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> x[-c(1,3,5)][1] 6 8 10 11 12 13 14 15 16 17 18 19 20
> y<-x[-c(1,length(x))]> y[1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19
22
12
E.Yıldıztepe
Character vectors
› colors<-c("red", "yellow", "blue")› more.colors <- c(colors, "green", "magenta", "cyan")
• To take substrings, use substr(x, start, stop) :> substr(colors, 1, 2)[1] "re" "ye" "bl"• Use the paste()function for building up strings by
concatenation:> paste(colors, "flowers")[1] "red flowers" "yellow flowers" "blue flowers"
23
E.Yıldıztepe
Character vectors
> paste("several ", colors, "s", sep="")[1] "severalreds" "severalyellows" "severalblues"
> paste("I like", colors, collapse = ", ")[1] "I like red, I like yellow, I like blue"
24
13
E.Yıldıztepe
List of available objects• The list of available objects in the specified environment can
be viewed with ls().> ls()[1] "a" "df" "f" "m" "Orange" "y"
• The class of a object is viewed with class(objname).> class(a)[1] “integer"
• Unnecessary object is deleted from the workspace using rm command.> rm(f)> ls()[1] "a" "df" "m" "Orange" "y"
• How can you remove all the objects in the workspace ?25
E.Yıldıztepe
Matrices• matrix() function> m <- matrix(1:6)> m
[,1][1,] 1[2,] 2[3,] 3[4,] 4[5,] 5[6,] 6
> m <- matrix(1:6, nrow=2, ncol=3)>m
[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6
26
14
E.Yıldıztepe
Matrices
• byrow=T switch causes the matrix to be filled in a rowwise rather than columnwise
> m <- matrix(1:6, nrow=2, ncol=3, byrow=T)>m
[,1] [,2] [,3][1,] 1 2 3[2,] 4 5 6
27
E.Yıldıztepe
Matrices
> rownames(m)<-c("a","b")
> colnames(m)<-c("c1","c2","c3")
> m
c1 c2 c3
a 1 2 3
b 4 5 6
28
15
E.Yıldıztepe
Matrices• Accessing elements:
> m[1,2] # the value in the first row and second column[1] 3
• Accessing whole rows or columns> m[1, ][1] 1 3 5> m[ ,1][1] 1 2
> m["a",]c1 c2 c3 1 2 3 > m[,"c3"]a b 3 6 29
E.Yıldıztepe
Matrices• Extend the matrix by adding rows or columns:
> m1<-rbind(m,c(7,8,9))> m1
c1 c2 c3a 1 2 3b 4 5 6
7 8 9
> m2<-cbind(m,c(7,8))> m2
c1 c2 c3 a 1 2 3 7b 4 5 6 8
30
16
E.Yıldıztepe
Matrices• Some useful functions :
> dim(m)[1] 2 3
> ncol(m)[1] 3
> nrow(m)[1] 2
> length(m)[1] 6
> t(m)a b
c1 1 4c2 2 5c3 3 6
31
E.Yıldıztepe
Exercises
• Let m be a matrix like below:[,1] [,2] [,3] [,4]
[1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
• What is the required R code to compute the sum of the first row?
• How to calculate the mean of the last column?
32
17
E.Yıldıztepe
Arrays • Arrays are a multidimensional extension of vectors. All of the objects of
an array must be of the same mode.• array() functiona <- array(data_vector, dim_vector)
> a <- array(1:24, dim=c(3,4,2))> a, , 1
[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4][1,] 13 16 19 22[2,] 14 17 20 23[3,] 15 18 21 24
33
E.Yıldıztepe
Arrays• Accessing elements:
> a[1,2,1] # the value in the [1,2,1][1] 4
• Accessing whole rows or columns> a[1, , ]
[,1] [,2][1,] 1 13[2,] 4 16[3,] 7 19[4,] 10 22
> a[,1, ][,1] [,2]
[1,] 1 13[2,] 2 14[3,] 3 15
> a[ , ,1][,1] [,2] [,3] [,4]
[1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
34
18
E.Yıldıztepe
Logical operations in R
35
E.Yıldıztepe
Logical operations in R> a <- c(TRUE, FALSE, FALSE, TRUE)> b <- c(13, 7, 8, 0)> b[a][1] 13 0
> sum(a)[1] 2
>!a[1] FALSE TRUE TRUE FALSE
• “If we attempt logical operations on a numerical vector, 0 is taken to be FALSE, and any nonzero value is taken to be TRUE:”
>a&b[1] TRUE FALSE FALSE FALSE 36
19
E.Yıldıztepe
Subset vectors&matrices using logical conditions> > b[1] 23 22 19 16 25 30Which values are greater than 20 ?
> b>20[1] TRUE TRUE FALSE FALSE TRUE TRUE
> b[b>20][1] 23 22 25 30
How many values are greater than 20 ?> sum(b>20) [1] 4
37
E.Yıldıztepe
Subset vectors&matrices using logical conditionsWhat are the indices of the values greater than twenty?> which(b > 20)[1] 1 2 5 6
We can also use “which” in indice to get values.> b[which(b > 20)][1] 23 22 25 30
38
20
E.Yıldıztepe
Data Frame
• A data frame is a collection of column vectors, a data table.
• Columns (variables) have names, and the data can be addressed by referencing to these names.
• Example:> df<-
data.frame(isim=c("ali","yeşim","murat", "hakan","gülay"),yas=c(21,20,22,21,19), bol=c("tarih","fizik","mat","kimya","ist"),puan=c(93,78,88,91,90))
39
E.Yıldıztepe
Data Frame> df
isim yas bol puan1 ali 21 tarih 932 yeşim 20 fizik 783 murat 22 mat 884 hakan 21 kimya 915 gülay 19 ist 90
> rownames(df)<-c("ogr1","ogr2","ogr3","ogr4","ogr5")> df
isim yas bol puanogr1 ali 21 tarih 93ogr2 yeşim 20 fizik 78ogr3 murat 22 mat 88ogr4 hakan 21 kimya 91ogr5 gülay 19 ist 90
40
21
E.Yıldıztepe
Data Frame
> class(df)[1] "data.frame"
> dim(df)[1] 5 4
> names(df)[1] "isim" "yas" "bol" "puan"
> rownames(df)[1] "ogr" "ogr2" "ogr3" "ogr4" "ogr5"
41
E.Yıldıztepe
Data Frame
• Extracting data from a data frame object, we can use indices or names.> df[ ,2][1] 21 20 22 21 19
> df[ ,"yas"][1] 21 20 22 21 19
> class(df[ ,2])[1] "numeric“
> class(df[ ,3])[1] "factor" 42
22
E.Yıldıztepe
Data Frame> df[2, ]
isim yas bol puanogr2 yeşim 20 fizik 78
> df["ogr2", ]isim yas bol puan
ogr2 yeşim 20 fizik 78
> df[2:4, ]isim yas bol puan
ogr2 yeşim 20 fizik 78ogr3 murat 22 mat 88ogr4 hakan 21 kimya 91
> class(df[2, ])[1] "data.frame" 43
E.Yıldıztepe
Data Frame
• A variable from a data frame (dataframe), which has some name (name) can be accessed through dataframe$name.> df$yas[1] 21 20 22 21 19
> df$yas[3][1] 22
> df[3,2][1] 22
44
23
E.Yıldıztepe
Sub-set elements of a Data Frame• Example:if you are interested in students who have the
puan values equal or over 90, > df$puan>=90[1] TRUE FALSE FALSE TRUE TRUE
• you can find out what are the indices of these student:> which(df$puan>=90)[1] 1 4 5
> sec<-which(df$puan>=90)> sec[1] 1 4 5
> df$puan[sec][1] 93 91 90
45
E.Yıldıztepe
Sub-set elements of a Data Frame
> df[sec,]isim yas bol puan
ogr ali 21 tarih 93ogr4 hakan 21 kimya 91ogr5 gülay 19 ist 90
> df2<-df[sec,]> df2
isim yas bol puanogr ali 21 tarih 93ogr4 hakan 21 kimya 91ogr5 gülay 19 ist 90
46
24
E.Yıldıztepe
Sub-set elements of a Data Frame• Sometimes using the function subset() may be a little easier
way to subset the data frame.
subset(dataframename, logical expression, select=expression)
> df2<-subset(df,puan>=90)> df2
isim yas bol puanogr ali 21 tarih 93ogr4 hakan 21 kimya 91ogr5 gülay 19 ist 90
> df2<-subset(df,puan>=90,select=c(isim,puan))> df2
isim puanogr ali 93ogr4 hakan 91ogr5 gülay 90 47
E.Yıldıztepe
Sub-set elements of a Data Frame
• By using negative sign (-) we can drop the variables other than some you specify.
> df3<-subset(df,puan>=90,select=c(-isim,-puan))> df3
yas bologr 21 tarihogr4 21 kimyaogr5 19 ist
48
25
E.Yıldıztepe
Sub-set elements of a Data Frame
• data() function• airquality dataset
49
E.Yıldıztepe
List Structures• We can store multiple data types in the same object with list
structures.> mylist =list(first=1:10,second=letters[1:10], third=matrix(1:12,3))> mylist
$first[1] 1 2 3 4 5 6 7 8 9 10
$second[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
$third[,1] [,2] [,3] [,4]
[1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
50
26
E.Yıldıztepe
List Structures
• The same result can be achieved using the names function after creating the (unnamed) list:> mylist = list(1:10, letters[1:10], matrix(1:12,3))> names(mylist) = c(“first”,”second”,”third”)
51
E.Yıldıztepe
List Structures - subscripting
• For lists, there is a subtle distinction between partof a list, and the object which that part of the list represents.> mylist[1]$first[1] 1 2 3 4 5 6 7 8 9 10
> mean(mylist[1])[1] NAWarning message:In mean.default(mylist[1]) :argument is not numeric or logical: returning NA
52
27
E.Yıldıztepe
List Structures - subscripting• R provides two convenient ways to resolve this issue:• Accessing the elements by name:
> mylist$first[1] 1 2 3 4 5 6 7 8 9 10> mean(mylist$first)[1] 5.5
• the double bracket subscript operator: (if the dollar sign notation would be inappropriate, for example, accessing elements through their index or through a namestored in a character variable)
> mylist[[1]][1] 1 2 3 4 5 6 7 8 9 10> mean(mylist[[1]])[1] 5.5
• mylist[["first"]] is the same as mylist$first53
E.Yıldıztepe
List Structures - Concatenating
• > mylist[4]<-list(matrix(1:10,5))
• > mylist2 <- c(mylist, listB, listC,…)
54
28
E.Yıldıztepe
Exercises
• Calculate the sum , for n = 100, 200.
• data() function
• Use attitute dataset• User summary() function
• Extract every column of attitute dataframe as a vector.
• airquality dataset• Extract every column of airquality dataframe as a
vector.
n
kk
1
22
55