36
Lecture 4 Xiaotong Suo

Intro to R Lecture 4

Embed Size (px)

DESCRIPTION

The fourth lecture of intro to R programming

Citation preview

  • Lecture 4

    Xiaotong Suo

  • Homework 1

    Ques7on 3

  • Todays agenda

    Data input/output Graphics

  • Data input/output R can write matrix and data frames to le using the

    func7on write.table. And read data from le using read.table.

    If you have a tab-delimited le, use the func7on read.delim instead. If the le is comma-separated le, then use read.csv.

    Year,Student,Major 2009, John Doe,Sta7s7cs 2009, Bart Simpson, Mathema7cs I

    The above is an example of a comma-separated le. Tab.delimited is the same except that we have tabs as a separator.

  • Data input/output con7nued

    The data set airquality is available is R and gives weather measurement in New York city over some period of 7me. Load that data set in a data frame and save it to a le.

  • Data input/output con7nued

    dt=airquality write.table(dt,Airquality.dt,col.names=T, row.names=F,sep=" ", na=Missing)

    You could also use write.csv. See the help documenta7on for details.

  • Data Input/output con7nued Things to keep in mind when reading or wri7ng to le:

    Header: whether the le has a rst row giving the names of the variables.

    Separator: What separator of elds is used: space, comma, tabular.

    Missing data character string: What character strings serve as missing data.

    Do you want to allow R to convert characters variables to factors? use op7ons stringsAsFactors and as.is.

  • Data input/output con7nued

    The general syntax of read.table: mydata=read.table(lename.dat,header=F, sep= , dec=., col.names=c(V1,V2),na.strings=NA)

  • Data input/output con7nued

    Let try it with the le just saved. dtNew=read.table(Airquality.dt,header=T, sep= , dec=.,na.strings=Missing)

  • Data input/output con7nued

    As men7oned earlier, if you have a tab-delimited le, use the func7on read.delim instead. If the le is comma-separated le, then use read.csv.

    Another func7on to read text data is read.fwf that works with xed-width text data. See the user manual for more detail.

    Yet, another func7on to read data from le is scan. It is more ecient when reading data of a single mode. See the user manual.

  • Data input/output

    Exercise: The le Earmarksbymember08.xls is an Excel le available in coursework. Load this le in R.

  • Graphics R has a powerful graphical capability.

    To plot a graph you need a graphical device. If you launch your plot right away, R will create automa7cally one graphical device for you.

    On OS Mac use the func7on quartz() to create a graphical device.

    On Windows systems, use windows()

    A graphical device can also be a le. Your graphs are then sent to that le. Use the func7ons pdf() postscript()

  • Graphics con7nued

    Example: the airquality data set. dt=airquality names(dt) boxplot(dt$Temp) plot(dt$Temp,type=l) plot(dt$Temp,dt$Wind,type=p) plot(dt$Temp,dt$Wind,type=p,xlab=Temperature, ylab=Wind, main=Wind vs Temp. in NY city May-Sept. 73)

  • Graphics

    Con7nuing with the airquality dataset, suppose we want to do a boxplot of the data from each month. dt$Month=as.factor(dt$Month) boxplot(Temp ~ Month,data=dt, names=c(May,June,July,August,Sept.))

  • Graphics

    What if we want to have mul7ple graphics on the same graphical device? There are many ways to do this.

    One simple possibility is layout.

  • Graphics

    Example: the airquality data set. m=matrix(c(1,2),ncol=2) layout(m) layout.show(2) boxplot(dt$Temp,main=Boxplot) plot(dt$Temp,type=l,main=Time series plot)

  • Graphics

    Example: the airquality data set. m=matrix(c(1,3,2,3),2,2) layout(m) layout.show(3) boxplot(dt$Temp,main=Boxplot Temp. in NY city)

    plot(dt$Temp,type=l,main=Temp. in NY city) plot(dt$Temp,dt$Wind,type=p,xlab=Temp, ylab=Wind,main=xyplot)

  • Graphics con7nued

    What if we want to put mul7ple graphs on the same plot.

    issue par(new=T) rst.

  • Graphics con7nued

    Few plokng func7ons in R: plot(x): plot the values of vector x. plot(x,y): bivariate plot of y as func7on of x. boxplot(x): box-and-whiskers plot. hist(x): produce a histogram of x. ... many others. See R manual by typing help.start().

  • Graphics con7nued

    Example: n=10000; X=rnorm(n); hist(X,breaks=200,prob=T,col=blue, xlim=c(-4,4),ylim=c(0,0.4))

    par(new=T) curve(dnorm,xlim=c(-4,4),ylim=c(0,0.4),lwd=2,col=red ,xlab=,ylab=)

  • graphics

    Example: X=rnorm(100); Y=rnorm(100) m=matrix(c(1,2),ncol=2) layout(m) plot(x,y) plot(x,y,xlab=100 Normal rvs,ylab=100 Normal rvs, col=blue,pch=4,main=Example of plot in R)

  • Graphics con7nued

    Exercise: The Californian freeway performance measurement system. The data is ow-occ-table.txt in coursework. Download the le to your computer and load it in R using read.table. Prac7ce with the following code.

  • Graphics con7nued

    dt=read.table(ow-occ-table.txt,header=T,sep=,)

    names(dt) Ind=complete.cases(dt) sum(Ind); length(dt[,1]) arach(dt)

  • Graphics con7nued

    m=matrix(c(1,5,2,5,3,5,4,5),ncol=4) layout(m) boxplot(Flow1,Flow2,Flow3,names=c(Flow1,Flow2,Flow3) main=Boxplots ows) boxplot(Occ1,Occ2,Occ3,names=c(Flow1,Flow2,Flow3), main=Boxplots Occup.)

    plot(Occ2,Flow2,type=p,col=blue, main=Flow vs Occup. for Lane 2)

    plot(Occ3,Flow3,type=p,col=red, main=Flow vs Occup. for Lane 3)

  • Graphics con7nued

    plot(Occ1,type=l,xlim=c(0,1700), ylim=c(0,0.5),col=green)

    par(new=T) plot(Occ2,type=l,xlim=c(0,1700), ylim=c(0,0.5),col=blue)

    par(new=T) plot(Occ3,type=l,xlim=c(0,1700), ylim=c(0,0.5),col=red,main=Occup. for Lane 1,2 and 3)

  • Graphics

    legend(x=top,legend=c(Lane 1, Lane 2, Lane 3),col=c(green,blue,red) ,lty=c(1,1,1))

  • ggplot2

    hrp://cran.r-project.org/web/packages/ggplot2/index.html

    Returns much nicer plots. Install the package rst in R and type library(ggplot2)

  • Control structures

    So far we have learned some of the basic aspects of R: working with its basic objects, input/output, graphics. Here, we learn the more general task of wri7ng computer programs using R.

  • Control structures con7nued

    An important component of a programming language is control structures to implement repe77ve tasks.

    R programming language has control structures similar to C

  • For loops

    Loops are used to carry out a sequence of related opera7ons without having to write the code for each step explicitly. For instance, suppose we want to calculate:

    ii=1

    10

  • For loops con7nued

    x=0 for (i in 1:10) { x=x+i }

  • For loops In the above program, x is an accumulator variable,

    meaning that its value is repeatedly updated while the program runs.

    Always remember to ini7alize accumulator variables (to zero in the example).

    To clarify, we can add a print statement inside the loop body. x=0 for (i in 1:10) { x=x+i

    print(c(i,x)) }

  • For loops

    The general structure of for loops: for (var in seq) expr Or for (var in seq){

    expr }

  • For loops con7nued

    Exercise: Given a matrix A, write a for loop that calculates the sum of each row of A.

  • For loops con7nued

    This is an example of a trivial for loop. There is never the need to do such loops in R because it provides a simple class of func7ons to do just that: the apply func7ons.

    Owen 7mes the apply func7ons even lead to faster code (but not always).

  • Next lecture

    More control structures R in Sta7s7cs(linear regression,etc)