Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Introduction to R
R-peer-group
QUB
February 5, 2013
R-peer-group (QUB) Session 1 February 5, 2013 1 / 46
Introductions
Some things to startTo download this presentation follow this link:http://tinyurl.com/aone2s6
Join the biology-r-users list by emailling Stephen Fowler<[email protected]>
This mailing list can be used for:
-> Asking for help with R
-> Giving help with R
-> Telling people about something cool you’ve learned in R
-> Sharing R resources (e.g. links, pdfs)-> Discussing these sessions
A handy list of useful resources can be downloaded from this link:http://tinyurl.com/awg76t7
R-peer-group (QUB) Session 1 February 5, 2013 2 / 46
Introductions
Who are the R-peer-group?
R-peer-group (QUB) Session 1 February 5, 2013 3 / 46
Helene Bovy
What do I use R for?All my analyses!
Research area?Invasion ecology, behavioural ecology. In the past, populationdynamics and image analysis.
Favorite feature?It makes you think about what you’re doing (and understand!)
R-peer-group (QUB) Session 1 February 5, 2013 4 / 46
Helene Bovy
What has R ever done for us?
R-peer-group (QUB) Session 1 February 5, 2013 5 / 46
Mark Emmerson
Mark’s QUB Profile
R-peer-group (QUB) Session 1 February 5, 2013 6 / 46
Kevin Keenan
What do I use R for?Everything! (Genetics data analysis, folder organisation, file re-namer,plotting, programming, LATEX(Sweave), software development,bioinformatics, data management, data manipulation, GIS)
Research area?Evolutionary and population genetics.
Favorite feature?It’s free!
R-peer-group (QUB) Session 1 February 5, 2013 7 / 46
Kevin Keenan
What has R ever done for us?
R-peer-group (QUB) Session 1 February 5, 2013 8 / 46
Ruth Kelly
What do I use R for?Statistical analyses (e.g. GLMMs, GAMs, multivariate analyses andbasic simulation models). I still use ArcView GIS for some spatialwork and MaxEnt for distribution modelling.
Research area?Spatial ecology and Botany
Favorite feature?Repeatability - stored code allows me to exactly replicate analyses andto remember exactly what I did!
R-peer-group (QUB) Session 1 February 5, 2013 9 / 46
Ruth Kelly
What has R ever done for us?Made analyses much faster (especially model selection and mapcalculations)
R-peer-group (QUB) Session 1 February 5, 2013 10 / 46
Jack Lennon
What do I use R for?
-> Spatial analysis and modelling-> Figures for publication-> To organise data and then call other more specialised packages (e.g.
WinBUGS, C++ code), for analysis
Research area?Spatial ecology and macroecology
What has R ever done for us?
-> Brought more researchers into the modelling/analysis fold who hithertocouldn’t/wouldn’t get involved in FORTRAN or C programming.
-> Hugely accelerated the development and exchange of novel ways ofanalysis and displaying scientific data
Favorite feature?Data manipulation (e.g. aggregate() function) and plotting features.
R-peer-group (QUB) Session 1 February 5, 2013 11 / 46
Jack Lennon
What has R ever done for us?
R-peer-group (QUB) Session 1 February 5, 2013 12 / 46
Rob Mrowicki
What do I use R for?Ecological data analysis, making plots/figures
Research area? Marine biodiversity and ecosystem functioning
Favorite feature? Help is always available!
R-peer-group (QUB) Session 1 February 5, 2013 13 / 46
Rob Mrowicki
What has R ever done for us?All of my analysis and plotting (so far)... as well as making me feelclever.
R-peer-group (QUB) Session 1 February 5, 2013 14 / 46
Mark Ravinet
What do I use R for?Basic data manipulation and analysis mixed models, Bayesianmodelling of stable isotope data, phylogenetic analysis and plottingpretty figures.
Research area?Evolutionary biology and ecological speciation.
Favorite feature?The ability to look at the script for a function and modify it to makeit do what I’d like it to!
R-peer-group (QUB) Session 1 February 5, 2013 15 / 46
Mark Ravinet
What has R ever done for us?
R-peer-group (QUB) Session 1 February 5, 2013 16 / 46
Course content
The course is intended for all post-grad/post-doc/academics whowant to learn R
It will cover a wide range of topics
-> you might want to attend one, a few, or all sessions
Table 1: A broad outline of themes within the course
Basic Intermediate Advanced
What is R? GLM GISArithmetic GAM sPCA
T-test Indexing nested objects Reproducible researchChi-square Code automation Writing functions
Read/Write data Data manipulation Analysis optimisationsIndexing Plotting HPC
R-peer-group (QUB) Session 1 February 5, 2013 17 / 46
Course format
A focus on discussion (Q & A)
Informal
Subject themes (e.g. Multivariate statistics, Visualising data)
Emphasis on practical labs and demonstrations
2 hour sessions held every Tuesday (05/02/13 - ??/??/13)
Optional homework for some sessions
Course content is open to change for popular topic suggestions
R-peer-group (QUB) Session 1 February 5, 2013 18 / 46
What is R?
R is a language and environment for statistical computing andgraphics
Language
-> Highly customisable, flexible and powerful-> R was originally developed at the university of Auckland, NZ
Environment
-> Supports the integration of many processes into one singlework-flow/pipeline
-> Highly extensible
R-peer-group (QUB) Session 1 February 5, 2013 19 / 46
What is R?
It is used widely in academic and commercial sectors
It is one of the most popular analytical softwares used amongprofessional data analysts
R-peer-group (QUB) Session 1 February 5, 2013 20 / 46
pros and cons
pros
-> Fully functional statistical programming environment-> Open source-> Cross platform can be used on all popular operating systems-> Excellent graphics capabilities-> Thousands of (free) extension packages.-> Large online community of users and developers-> Analysis frameworks can be saved in reproducible formats.-> High level programming language
cons
-> Steep learning curve (We’ll help here)-> Minimal GUI capabilities-> Analysing large data sets can be troublesome (??bigmemory)-> Scripts cannot be compiled into stand alone .exe programs-> Interpreted language, slow.-> Thousands of (free) extension packages.
R-peer-group (QUB) Session 1 February 5, 2013 21 / 46
What R can do for you
Basic calculator
> 10 * 10
[1] 100
Advanced calculator
FormulaIn(Q; J = j) = −pj logepj +
∑Ki=1
pijK logepij
Code> In <- (-(p[j])*log(p[j])) + sum((p[i, j]/k)*log(p[i, j]))
R-peer-group (QUB) Session 1 February 5, 2013 22 / 46
What R can do for you
Writing Programs> #define a program (function)
> letterCount <- function(infile){
+ if(!is.character(infile)){
+ infile <- as.character(infile)
+ }
+ phrase <- scan(infile, what = "", sep = "\n", quiet = TRUE)
+ letters1<-unlist(strsplit(phrase,""))
+ alpha<-c(letters[1:26],LETTERS[1:26])
+ sorter<-function(x){
+ if(is.element(x,alpha)==TRUE){
+ return (x)
+ } else {
+ return (NA)
+ }
+ }
+ lettersCount<-lapply(letters1,sorter)
+ uplowFix <- lapply(lettersCount, function(x){
+ if(is.element(x, LETTERS)){
+ return(x <- letters[which(LETTERS == x)])
+ } else {
+ return(x)
+ }
+ })
+ lettersCount<-na.omit(unlist(uplowFix))
+ x<-table(lettersCount)
+ plot(x, ylab = "Number Used", col = rainbow(26), las = 1)
+ return(x)
+ }
R-peer-group (QUB) Session 1 February 5, 2013 23 / 46
What R can do for you
Running programs
> # run the program to see the results
> x <- letterCount("text.txt")
0
5
10
15
20
25
30
lettersCount
Num
ber
Use
d
a c e g i m o r t v x
R-peer-group (QUB) Session 1 February 5, 2013 24 / 46
What R can do for you
Designing web applications
Stocks app
Global Biodiversity Facility
Price trajectories
Gene Networks
Finance Showreel
R-peer-group (QUB) Session 1 February 5, 2013 25 / 46
What R can do for you
Data analysis
-> General statistical tests(t.test, anova, glm, chisq.test, lm)
-> Specialised statistical packages(abc, BSgenome, cluster, igraph, shapefiles)
Data Visualisation
-> Basic plots
−3 −2 −1 0 1 2 3
−20
00
200
400
x
y
R-peer-group (QUB) Session 1 February 5, 2013 26 / 46
What R can do for you
Data visualisation-> Animated plots-> Advanced plots
t1t2t3t4t5t6t7t8t9t10t11t12t13t14t15t16t17t18t19t20t21t22t23t24t25t26t27t28t29t30t31t32t33t34t35t36t37t38t39t40t41t42t43t44t45t46t47t48t49t50t51t52t53t54t55t56t57t58t59t60t61t62t63t64t65t66t67t68t69t70t71t72t73t74t75t76t77t78t79t80t81t82t83t84t85t86t87t88t89t90t91
R-peer-group (QUB) Session 1 February 5, 2013 27 / 46
What R can do for you
Generate .kml files for use in GoogleEarth (R2G2).
R-peer-group (QUB) Session 1 February 5, 2013 28 / 46
What R can do for you
Having fun-> Read XKCD comic strips
> library(RXKCD)
> getXKCD(searchXKCD(which = "programming")[1,"num"])
R-peer-group (QUB) Session 1 February 5, 2013 29 / 46
What R can do for you
Having fun!-> Browsing twitter
library(twitteR)
rProg <- searchTwitter('#r programming', n = 500)
pythonProg <- searchTwitter('#python programming', n = 500)
cProg <- searchTwitter('#c programming', n = 500)
#R programming #python programming #c programming
Programming language mentions in twitter
twitt
er m
entio
ns
010
020
030
040
050
0
R-peer-group (QUB) Session 1 February 5, 2013 30 / 46
Practical sessions
Exploring RStudioFolder management, R projects, the R console,
help, R scripts and more
R-peer-group (QUB) Session 1 February 5, 2013 31 / 46
Practical sessions
Objects in R
R-peer-group (QUB) Session 1 February 5, 2013 32 / 46
Vectors
Vectors are the most simple objects in R
-> Can contain one or many elements
They can contain character, integer, numeric, complex, logical orraw value classes (atomic)
All elements in a vector must be the same class (e.g. all numeric)
Vectors can be manually passed to the R console or assigned tovariable names
-> Manual
> c(1,2,3,4,5) # Use c to combine elements
[1] 1 2 3 4 5
-> Assignment
> x <- c(1,2,3,4,5) # Assign the vector to the variable x
R-peer-group (QUB) Session 1 February 5, 2013 33 / 46
Vectors
If we have two numeric/complex/integer/logical/raw vectors we canadd them together
> x <- c(1,2,3,4,5) # Create a vector variable x
> y <- c(5,4,3,2,1) # Create a vector variable y
> x + y # sum of x and y
[1] 6 6 6 6 6
Or divide them> x / y # x divided by y
[1] 0.2 0.5 1.0 2.0 5.0
Or multiply them> x * y # x multiplied by y
[1] 5 8 9 8 5
> # Assignment
> z <- x * y # assign the value x * y to the
> # variable z
R-peer-group (QUB) Session 1 February 5, 2013 34 / 46
Vectors
What if you wanted to find out what the value of a particular elementof a variable is?
-> Index mapping> x # print x
[1] 1 2 3 4 5
> x[3] # 3rd element of x
[1] 3
> x[2:4] # elements 2 to 4 of x
[1] 2 3 4
> x[-4] # all elements of x except the 4th
[1] 1 2 3 5
> x[3] * y[3] # Multiply the 3rd elements of x and y
[1] 9
R-peer-group (QUB) Session 1 February 5, 2013 35 / 46
Vectors
Using index mapping we can change the value of individual elements> x[1] <- 50 # Make the 1st element of x = 50
> x # print x
[1] 50 2 3 4 5
There are many ways to manually create vectors in R
> # Using the colon, ':'> x <- 1:10
> # Using the combine function 'c'> x <- c(1,2,3,4,5,6,7,8,9,10)
> # Using the function 'seq'> x <- seq(from = 1, to = 10, by = 1)
> # Create a string vector
> y <- c("apples", "oranges")
R-peer-group (QUB) Session 1 February 5, 2013 36 / 46
Do it yourself
Try to do the following few tasks. Instructors will walk around to helpyou.
1 Create a character vector and assign it to the variable myName. Thevector should contain two character elements corresponding to yourfirst and second names. Print myName to the console.
2 Create a numeric vector x, with elements from 5 to 100 increasing insteps of 5.
Hint-Remember there are more than one way to create a vector in R.Try typing each of the following (exactly as shown here), into the Rconsole to see which way you should use.
-> ?colon
-> ?c
-> ?seq
3 Assign the variable surName with value equal to the second element ofyour variable myName without re-typing your surname into the console.
R-peer-group (QUB) Session 1 February 5, 2013 37 / 46
Do it yourself
Questions continued.
4 Create a vector y. Its elements should equal to the square of thecorresponding element in x. For example, if the last element in x = 10then the last element in y should be 100 or 102 = 100
Hint-You do not need to calculate each element separately. R has afeature called vectorisation meaning it can apply an operation to anentire vector rather than having to do it to one value at a time. Theoperator of interest in this case is ^.
5 Assign the results of the following calculations to variable names ofyour choice.
-> The product of the vectors x and y
-> The result of x times itself-> The result of taking the square root of the vector y
Hint-Taking the square root of a value is the same as raising it to thepower of 0.5
R-peer-group (QUB) Session 1 February 5, 2013 38 / 46
What else can you do with vectors?
We can also use R to answer logical questions about objects.
-> Imagine we want to ask R if first element of two vectors are equal.We could do this by typing the following:
> x == y
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[8] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE FALSE FALSE FALSE
-> As expected, none of the values in x are equal to the correspondingvalues in y.
This type of query is important when checking the output of analyses.Very useful for debugging scripts before applying them to your realdata.
-> How do you think you could ask R if two elements from a vector werethe same?
R-peer-group (QUB) Session 1 February 5, 2013 39 / 46
What about functions?
Functions are small programs that make R so useful.
They are usually made up of very simple commands pieced togetherto carry out a complex process.
Functions accept arguments. These can be data or parameters.
We have already seen the function seq which we know accepts threearguments in its most common use.
> seq(from = 1, to = 10, by = 1)
Most functions work in a very similar way to this function. You simplyhave to provide the correct arguments for them to work.
R-peer-group (QUB) Session 1 February 5, 2013 40 / 46
Using functions
We can find out about the arguments required by a functions by usingthe built in R help system.
At the most general level, we saw earlier that by typinghelp.start() we could launch a help manual for R. However, thereare more specific help functions available.
help()
help.search()
?
??
Usually the ? help functions is sufficient for finding out aboutfunctions. How does it work?
R-peer-group (QUB) Session 1 February 5, 2013 41 / 46
Using functions
Type the following commands into the R console.
?sum
?prod
?seq
?mean
?sqrt
Using the functions prod() and sqrt() repeat the following threetasks(N.B. read the help documentation to find out how to use these functions):
-> The product of the vectors x and y
-> The result of x times itself-> The result of taking the square root of the vector y-> Ask R if x is equal to the square-root of y
R-peer-group (QUB) Session 1 February 5, 2013 42 / 46
Visualising variables
Often it is necessary to visualise variables to understand the types ofstatistical test you might need to do
In simple cases we can visualise variables using the plot function.> x <- rnorm(n = 1000, mean = 10, sd = 5)
> y <- x^2
> plot(y ~ x)
−5 0 5 10 15 20 25
0
100
200
300
400
500
600
700
x
y
R-peer-group (QUB) Session 1 February 5, 2013 43 / 46
Some final challenges
An American colleague has collected samples of American eels,Anguilla rostrata for a research paper you are working on. They havesent you a vector of adult fish lengths as well as the mean andstandard deviation of the data as follows:
Lengths
[1] 46.58 44.25 41.27 44.70 46.94 4.23 46.25
[8] 44.03 46.04 46.55
Mean length
[1] 45.105
Standard deviation of the mean
[1] 1.7326
R-peer-group (QUB) Session 1 February 5, 2013 44 / 46
Some final challenges
Tasks
a) Check to make sure that the mean and standard deviation given to youcorresponds to the mean and standard deviation of the data you’verecorded.
b) If there is a problem what do you think it is considering the fact thatyour colleague has told you that all fish were longer than 40 inches?
-> Which value in the data is incorrect?-> What should the correct value be?-> Replace the incorrect value with the right one
c) Using logical tests, check that the mean and sd sent to you by yourcolleague are the correct ones?
R-peer-group (QUB) Session 1 February 5, 2013 45 / 46
Some final challenges
The values your colleague has sent you are well below the standardlengths reported for all populations of eels studied to date. Youremember that the data have been recorded in the imperial units,inches rather than the expected metric units, cm. Using the functionsand techniques demonstrated today,
d) Convert your vector of lengths from inches to cm, saving the results ina new vector. [1 inch = 2.54 cm]
Hint-There may be functions that can do this for you!
e) Calculate the new mean and standard deviation of the converted data.
f) Calculate the variance of both data sets. [sd =√Var(x)].
R-peer-group (QUB) Session 1 February 5, 2013 46 / 46