Upload
trinhkhue
View
216
Download
2
Embed Size (px)
Citation preview
Introduction to R: Part IIIStatistics and Linear Modelling
Alexandre Perera i Lluna1,2
1Centre de Recerca en Enginyeria Biomedica (CREB)Departament d’Enginyeria de Sistemes, Automatica i Informatica Industrial
(ESAII)Universitat Politecnica de Catalunyamailto:[email protected]
2Centro de Investigacion Biomedica en Red en Bioingenierıa, Biomateriales yNanomedicina (CIBER-BBN)
Jan 2011 / Introduction to RUniversitat Rovira i Virgili
StatisticsTests
Linear regression
Contents I
1 StatisticsUnivariate DataBivariate dataMultivariate Data
2 TestsHypothesis testsTwo population testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
3 Linear regressionLinear modelsRegression analysisMultivariate regressionVariance analysis
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
mean(),sd()
Mean Standard deviation, variance
Let’s define a random variable with a normal distribution> x <- rnorm(100, mean = 2, sd = 0.5)
mean(),sd()> mean(x)
[1] 2.016474
> median(x)
[1] 1.996165
> sd(x)
[1] 0.4814775
> var(x)
[1] 0.2318206
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.068 1.638 1.996 2.016 2.347 3.446
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Quantiles
quantile() any quantile between 0 and 1> quantile(x, 0.25)
25%
1.637719
> quantile(x, c(0.1, 0.9))
10% 90%
1.429676 2.663940
Difference between 1st and 3rd quartile> IQR(x)
[1] 0.7096768
cut() categorizes continous variables> summary(cut(x, c(min(x), mean(x), quantile(x, 0.75), max(x))))
(1.07,2.02] (2.02,2.35] (2.35,3.45] NA's50 24 25 1
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Histograms
> cuts <- quantile(x, seq(0,
+ 1, 0.1))
> hist(x, breaks = cuts)
> rug(x)
Histogram of x
x
Den
sity
1.0 1.5 2.0 2.5 3.0 3.50.
00.
20.
40.
60.
8
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
boxplots
> boxplot(x, horizontal = TRUE,
+ col = "pink", xlab = "cm",
+ main = "Oscillation")
●
1.0 1.5 2.0 2.5 3.0 3.5
Oscillation
cm
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Density
> cortes <- quantile(x, seq(0,
+ 1, 0.1))
> hist(x, breaks = cortes)
> rug(x)
> lines(density(x, bw = "SJ"),
+ col = "red")
Histogram of x
x
Den
sity
1.0 1.5 2.0 2.5 3.0 3.50.
00.
20.
40.
60.
8
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Factores
> language <- as.factor(c("french",
+ "french", "german", "german",
+ "english", "german", "french",
+ "english", "french", "german"))
> gender <- as.factor(c("man",
+ "woman", "woman", "woman",
+ "woman", "woman", "man",
+ "woman", "man", "man"))
> table(gender, language)
language
gender english french german
man 0 3 1
woman 2 1 3
> plot(table(language, gender),
+ col = c("pink", "blue"))
table(language, gender)
language
gend
er
english french german
man
wom
an
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
barplot
> barplot(table(language, gender),
+ col = c("pink", "blue",
+ "green"), legend.text = levels(language))
man woman
germanfrenchenglish
01
23
45
6
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
stripchart
1D plots, alternative to boxplot() for certain cases:
> attach(iris)
> stripchart(Sepal.Length ~ Species)
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
seto
save
rsic
olor
virg
inic
a
Sepal.Length
> boxplot(Sepal.Length ~ Species)
> detach(iris)
●
setosa versicolor virginica
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Formula notation in R
It is possible to use a formula notation in R, when these arenamed (names()).
In previous example: Sepal.Length ~ Species
Formulas in R
variable ∼ group
variables per group
This notation is homogeneous along most of R code
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Formula Notation in R, II
Formulas
response ∼ model
See help(formula)
log(Sepal.Length) ~ Species
Arithmetic functions are allowed:
I(Sepal.Length + Petal.Length) ~ Species
Heavily used in linear regression, but also in visualizationfunctions
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
Contingency tables
> data(UCBAdmissions)> UCBAdmissions[, , 1:2]
, , Dept = A
GenderAdmit Male FemaleAdmitted 512 89Rejected 313 19
, , Dept = B
GenderAdmit Male FemaleAdmitted 353 17Rejected 207 8
> DF <- as.data.frame(UCBAdmissions)> head(DF)
Admit Gender Dept Freq1 Admitted Male A 5122 Rejected Male A 3133 Admitted Female A 894 Rejected Female A 195 Admitted Male B 3536 Rejected Male B 207
xtabs() Mulltiple factorscontingency tables
Used commonly on data framesdata.frame
> xtabs(Freq ~ Gender + Admit,+ DF)
AdmitGender Admitted Rejected
Male 1198 1493Female 557 1278
> summary(xtabs(Freq ~ ., DF))
Call: xtabs(formula = Freq ~ ., data = DF)Number of cases in table: 4526Number of factors: 3Test for independence of all factors:
Chisq = 2000.3, df = 16, p-value = 0
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
score plots I
> def.par <- par(no.readonly = TRUE)> data(iris)> xhist <- hist(iris$Petal.Length, plot = FALSE)> yhist <- hist(iris$Sepal.Length, plot = FALSE)> top <- max(c(xhist$counts, yhist$counts))> xrange <- range(iris$Petal.Length)> yrange <- range(iris$Sepal.Length)> nf <- layout(matrix(c(2, 0, 1, 3), 2, 2, byrow = TRUE), c(3,+ 1), c(1, 3), TRUE)> layout.show(nf)> par(mar = c(3, 3, 1, 1))> plot(iris$Petal.Length, iris$Sepal.Length, xlim = xrange, ylim = yrange,+ xlab = "", ylab = "")> par(mar = c(0, 3, 1, 1))> barplot(xhist$counts, axes = FALSE, ylim = c(0, top), space = 0)> par(mar = c(3, 0, 1, 1))> barplot(yhist$counts, axes = FALSE, xlim = c(0, top), space = 0,+ horiz = TRUE)> par(def.par)
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
score plots II
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
lattice
library(lattice)
xyplot(Sepal.Length ~ Sepal.Width
| Species, data=iris)
Sepal.Width
Sep
al.L
engt
h
5
6
7
8
2.0 2.5 3.0 3.5 4.0 4.5
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●● ●
●●
●●
●●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
setosa
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●●
●●
●●
●
●●●
●●
●
●
●
●
●● ●
●
●
●
●●●
●
●
●
versicolor
5
6
7
8
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●●
●
●●●
●●
●
●
virginica
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Univariate DataBivariate dataMultivariate Data
pairs()
> panel.hist <- function(x,+ ...) {+ usr <- par("usr")+ on.exit(par(usr))+ par(usr = c(usr[1:2],+ 0, 1.5))+ h <- hist(x, plot = FALSE,+ breaks = 10)+ breaks <- h$breaks+ nB <- length(breaks)+ y <- h$counts+ y <- y/max(y)+ rect(breaks[-nB], 0, breaks[-1],+ y, col = "blue", ...)+ }> pairs(iris[, c(1:4)], panel = panel.smooth,+ cex = 1.5, pch = 21, bg = as.numeric(iris$Species),+ diag.panel = panel.hist)
Sepal.Length
2.0 3.0 4.0
●●
●●●
●
●●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●●
● ●●●
●●
●●●
●●
●
●
●
●●
● ●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●●
●●
●●●
●●
●●
●●●●
●
●
●●
●●●
●●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●●
●●
●
●
●
●
●●
●
●●●
●●
●
●●●
●
●●●
●●●
●
●●●●
●●
●
●●●●●●
●●
●
●
●
●●
●
●●●●
●
●●
●
●
●●
●●●●
●●
●●●
●●
●
●
●
●●
●●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●●
●●
●●●
●●
●●●●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●●
●●
●
●
●
●
●●
●
●●●
●●
●
●●●
●
●●●
●●
●
●
●●●●●●
●
0.5 1.5 2.5
4.5
5.5
6.5
7.5
●●●●●
●
●●
●
●
●
●●
●
●●●
●
●
●●
●
●
●●●●●●
●●
●●●
●●
●
●
●
●●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●●
●●
●●●● ●
●●●●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●●
●●
●
●
●
●
●●
●
●●●
●●
●
●●●
●
●●●
●●
●
●
●●●●●
●●
2.0
3.0
4.0
●
●●●
●
●
● ●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●●●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
● ●● ●
●
●●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●●●●●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
● ●
●●
●●●
●
●●●
●
●
●● ●●●
●
●●
●
●
●
●
●
Sepal.Width●
●●●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●●●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●●●●●●
●●●
● ●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●●
●●
●●●
●
●●●
●
●
●● ●●●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●●●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●●●●
●●
●●●
● ●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●●
●●
●
●
●●●
●
●
●● ● ●●
●
●●
●
●
●
●
●
●●●● ●●
● ●● ● ●●●● ●
●●●●●●●
●●●●●●●●● ●●●●● ●●● ●●●●●●
●●● ●●
●●●
●●● ●
●
●●
●●●●
●
●●●
●●
●
●
●●●●
●●●
●●●●
●● ● ●
●●●● ●
●●
●●● ●
●
●
●
●
●●●
●
●
●●
●
●●●
●●●●
●●
●●
●
●
●
●●
●●● ●
●●
●●
●●
●●●
●●●●
●●●●●●
●
●●●● ●●
●●●● ●●●● ●
●●●●●● ●
●●●● ●●●●● ● ●●●
● ●●● ●●● ●●
●● ●● ●●
●●●
●●● ●
●
●●
●●●
●
●
●●●
●●
●
●
● ●●●
●●●
●●● ●
●● ●●
●●●
● ●●
●
● ●●●
●
●
●
●
●●●
●
●
●●
●
●● ●● ● ●●
●●
●●
●
●
●
●●
●●●●●
●
●●
●●
●●●●●●●
●●●● ● ●●
Petal.Length
12
34
56
7
●●●●●●
●●●●●●●●●
●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●
●●
●●●●●
●●●
●●● ●
●
●●
●●●
●
●
●●●
●●
●
●
●●●●● ●●
●●●●
●●●●●
●●●●●
●
●●●●
●
●
●
●
●● ●
●
●
●●
●
●●●
● ●●●
●●
●●
●
●
●
●●
●●●●
●●
●●
●●●●
●● ●
●●
●●●●●●
●
4.5 5.5 6.5 7.5
0.5
1.5
2.5
●●●● ●●● ●● ● ●●●● ●
●●● ●●●●
●
●
●●●●●●●●
●●●● ●●● ●●●●
●●●●● ●●
●● ●●
●●
●
●
●●
●
●
●
●● ●●
●
●
●
●
●●
●●●●
●●
●●●●
●● ● ●●●●●
●●
●
●●● ●●
●
●
●●
●
● ●
● ●●
●
●●●●
● ●
●
●●
●
●
● ●●
●
●●●
●
●
● ●●
●●
●●
●●
●
●●
●
●●●
●●
●
●
●●●● ●●●●●● ●●●● ●
●●● ●●●●
●
●
●●●●●●●
●
●●●● ●●● ●●● ●
●●● ●● ●●
●●●●●●
●
●
●●
●
●
●
●●●●
●
●
●
●
●●
●●●●
●●
●●●●
● ● ●●● ●●●
●●
●
● ●●●●●
●
●●
●
●●
● ●●
●
●●●●
● ●
●
●●
●
●
●●●
●
●●●
●
●
● ●●
●●
● ●
●●
●
●●
●
●●
●
● ●
●
●
1 2 3 4 5 6 7
●●●●●●●●●●●●●●●
●●●●●●●
●
●
●●●●●●●●
●●●●●●●●●●●
●●●●●●●
●●●●
●●
●
●
●●
●
●
●
●● ●●
●
●
●
●
●●
●●●●
●●
●●●●
●●●●●●●●●
●●
●●●●●●
●
●●
●
● ●
● ●●
●
●●●●
●●
●
●●
●
●
● ●●
●
●●●
●
●
●●●
●●
●●
●●
●
●●
●
●●
●
●●
●
●
Petal.Width
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
dpq-functions
pnorm(q) returns the probability thata random variable measures lowerthan q (larger than withlower.tail=FALSE)
> pnorm(c(0, 1))
[1] 0.5000000 0.8413447
> pnorm(1, lower.tail = F)
[1] 0.1586553
qnorm(q) responds to the inversequestion. Which value corresponds toa certain probability. (e.g. .75 for Q3)
> qnorm(c(0.75, 0.841345))
[1] 0.6744898 1.0000010
dnorm(x) theoretical density function
> curve(pnorm(x), -5, 5, col = "red",+ ylab = "", frame.plot = FALSE)> curve(dnorm(x), -5, 5, col = "blue",+ add = TRUE)> legend("topleft", legend = c("pnorm(x)",+ "dnorm(x)"), col = c("red",+ "blue"))
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
x
pnorm(x)dnorm(x)
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
Standardization
With our random variable x . (rnorm(), runif(), ...)> x <- rnorm(100, mean = 2,+ sd = 0.5)> z <- (x - 2)/0.5> mean(z)
[1] 0.08849595
> sd(z)
[1] 0.986837
z -score> pnorm(z)[1:5]
[1] 0.1282827 0.7556697 0.2045399[4] 0.4998171 0.1222717
> pnorm(x, mean = 2, sd = 0.5)[1:5]
[1] 0.1282827 0.7556697 0.2045399[4] 0.4998171 0.1222717
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
t-test
t-statistic:
t =X − µs/√n
> x <- rnorm(100, mean = 2, sd = 0.5)> t.test(x)
One Sample t-test
data: xt = 38.0565, df = 99, p-value <2.2e-16alternative hypothesis: true mean is not equal to 095 percent confidence interval:1.885099 2.092484sample estimates:mean of x1.988791
> x <- rnorm(100, mean = 0, sd = 0.5)> t.test(x)
One Sample t-test
data: xt = 0.5835, df = 99, p-value =0.5609alternative hypothesis: true mean is not equal to 095 percent confidence interval:-0.06811229 0.12486603sample estimates:mean of x0.02837687
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
Proportion test
In a poll (“yes”/”no”), 43 say “yes”. Is that a 50 % of population?(two-sided alternative)
H0: null hypothesis p = 0,5
H1: alternative hypothesis p 6= 0,5
> prop.test(43, 100, p = 0.5)
1-sample proportions test withcontinuity correction
data: 43 out of 100, null probability 0.5X-squared = 1.69, df = 1, p-value =0.1936alternative hypothesis: true p is not equal to 0.595 percent confidence interval:0.3326536 0.5327873sample estimates:
p0.43
> prop.test(430, 1000, p = 0.5)
1-sample proportions test withcontinuity correction
data: 430 out of 1000, null probability 0.5X-squared = 19.321, df = 1, p-value= 1.105e-05alternative hypothesis: true p is not equal to 0.595 percent confidence interval:0.3991472 0.4613973sample estimates:
p0.43
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
Wilcox-test
Rain distribution in Albacete (Spain). Asimetric distribution (t-test isnot allowed).
H0: µ = 5H1: µ > 0,5
> x = c(12.8, 3.5, 2.9, 9.4, 8.7,+ 0.7, 0.2, 2.8, 1.9, 2.8, 3.1,+ 15.8)> stem(x)
The decimal point is 1 digit(s) to the right of the |
0 | 012333340 | 991 | 31 | 6
> wilcox.test(x, mu = 5, alt = "greater")
Wilcoxon signed rank test withcontinuity correction
data: xV = 39, p-value = 0.5156alternative hypothesis: true location is greater than 5
null not rejectedAlexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
t-test for two populations
t-statistic:
t =(X2 − X2)− (µ1 − µ2)√
s21n1− s22
n2
Assuming X1 and X2 normally distributedEqual variances:> x = c(15, 10, 13, 7, 9, 8, 21,+ 9, 14, 8)> y = c(15, 14, 12, 8, 14, 7, 16,+ 10, 15, 12)> t.test(x, y, alt = "less", var.equal = TRUE)
Two Sample t-test
data: x and yt = -0.5331, df = 18, p-value =0.3002alternative hypothesis: true difference in means is less than 095 percent confidence interval:
-Inf 2.027436sample estimates:mean of x mean of y
11.4 12.3
Unequal variances> t.test(x, y, alt = "less")
Welch Two Sample t-test
data: x and yt = -0.5331, df = 16.245, p-value =0.3006alternative hypothesis: true difference in means is less than 095 percent confidence interval:
-Inf 2.044664sample estimates:mean of x mean of y
11.4 12.3
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
χ2-test
Allows statistical test for categorial data
χ2-test:
χ2 =n∑
i=1
(fi − ei)2
ei
Assumption: All cases occur more than one and 80% of cases > 5.
> freqs <- c(22, 21, 22, 27, 22,+ 36)> probs <- rep(1/6, 6)> chisq.test(freqs, p = probs)
Chi-squared test for givenprobabilities
data: freqsX-squared = 6.72, df = 5, p-value =0.2423
> freqs <- c(22, 31, 12, 37, 12,+ 36)> probs <- rep(1/6, 6)> chisq.test(freqs, p = probs)
Chi-squared test for givenprobabilities
data: freqsX-squared = 25.92, df = 5, p-value= 9.248e-05
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
χ2-test II
Does certain process follow a certain distribution? (e.g. dice)
> freqs <- c(22, 21, 22, 27, 22,+ 36)> probs <- rep(1/6, 6)> chisq.test(freqs, p = probs)
Chi-squared test for givenprobabilities
data: freqsX-squared = 6.72, df = 5, p-value =0.2423
> freqs <- c(22, 31, 12, 37, 12,+ 36)> probs <- rep(1/6, 6)> chisq.test(freqs, p = probs)
Chi-squared test for givenprobabilities
data: freqsX-squared = 25.92, df = 5, p-value= 9.248e-05
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
χ2-test III: homogeneity
Are two processes generated by the same distribution? (e.g. two dice,normal truce (ok/ko))
> dado.ok <- sample(1:6, 200, p = c(1,+ 1, 1, 1, 1, 1)/6, replace = T)> dado.ko <- sample(1:6, 100, p = c(0.5,+ 0.5, 0.5, 0.5, 2, 2)/6, replace = T)> freqs.ok <- table(dado.ok)> freqs.ko = table(dado.ko)> rbind(freqs.ok, freqs.ko)
1 2 3 4 5 6freqs.ok 29 25 42 39 40 25freqs.ko 6 11 5 12 35 31
> chisq.test(rbind(freqs.ok, freqs.ko))
Pearson's Chi-squared test
data: rbind(freqs.ok, freqs.ko)X-squared = 35.5763, df = 5,p-value = 1.154e-06
> dado.ok <- sample(1:6, 200, p = c(1,+ 1, 1, 1, 1, 1)/6, replace = T)> dado.ko <- sample(1:6, 100, p = c(1.1,+ 1, 1, 1.1, 1, 1)/6, replace = T)> freqs.ok <- table(dado.ok)> freqs.ko = table(dado.ko)> rbind(freqs.ok, freqs.ko)
1 2 3 4 5 6freqs.ok 35 33 38 32 37 25freqs.ko 12 19 13 18 14 24
> chisq.test(rbind(freqs.ok, freqs.ko))
Pearson's Chi-squared test
data: rbind(freqs.ok, freqs.ko)X-squared = 9.2915, df = 5, p-value= 0.09799
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
Durbin-Watson
Evaluates Durbin-Watson for error auto-correlation
durbin.watson(car) Durbin-Watson Test for Autocorrelated Errorsdwtest(lmtest) Durbin-Watson Test
> library(lmtest)> err1 <- rnorm(100)> x <- rep(c(-1, 1), 50)> y1 <- 1 + x + err1> dwtest(y1 ~ x)
Durbin-Watson test
data: y1 ~ xDW = 1.8898, p-value = 0.3244alternative hypothesis: true autocorrelation is greater than 0
> err2 <- filter(err1, 0.9, method = "recursive")> y2 <- 1 + x + err2> dwtest(y2 ~ x)
Durbin-Watson test
data: y2 ~ xDW = 0.2426, p-value < 2.2e-16alternative hypothesis: true autocorrelation is greater than 0
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
Random numbers: quantile2 plots
> x = rnorm(100, 0, 1)> qqnorm(x, main = "normal(0,1)")> qqline(x)
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
−2 −1 0 1 2
−2
−1
01
2
normal(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
> x = rnorm(100, 10, 15)> qqnorm(x, main = "normal(10,15)")> qqline(x)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−20
020
40
normal(10,15)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Hypothesis testsT-Testsχ2- TestsDurbin-WatsonGraphical tests
Random numbers: quantile2 plots
> x = rexp(100, 1/10)> qqnorm(x, main = "exponential mu=10")> qqline(x)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
−2 −1 0 1 2
010
2030
4050
60
exponential mu=10
Theoretical Quantiles
Sam
ple
Qua
ntile
s
> x = runif(100, 0, 1)> qqnorm(x, main = "unif(0,1)")> qqline(x)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
unif(0,1)
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Linear Models
Assume a response variable Y , dependant on three predictors:
Y = f (X1,X2,X3) + ε
Most easy linear form:
Y = β0 + β1X1 + β2X2 + β3X3 + ε
No need that predictors should be linear, but the input should belineal:
Y = β0 + β1X1 + β2log(X2) + β3X3X1 + ε
On the other side:
Y = β0 + β1Xβ2
1 + ε
Is not linear.
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Linear modelling: matrix representation
Y = Xβ + ε
In matrix representationy1
y2
. . .yn
=
1 x11 x12 . . . x1P
1 x21 x22 . . . x2P
. . . . . . . . . . . . . . . . .1 xn1 xn2 . . . xnP
·β0
β1
. . .βP
+
ε1
ε2
. . .εn
(1)
The most simple model is the null model:y1
y2
. . .yn
=
11. . .1
µ+
ε1
ε2
. . .εn
(2)
Find β so that X β is as close to Y as possible.
y = X β
ε lives in the subspace (n-p)Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Geometrical representation
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Galapagos
Galapago islands : 30 islands, 7 variables
Species: The number of species of tortoise found on the islandArea: The area of the island (km2)Nearest: The distance from the nearest island (km)Elevation: The highest elevation of the island (m)Endemics: The number of endemic speciesScruz: The distance from Santa Cruz island (km)Adjacent: the area of the adjacent island (square km)
> library(faraway)> data(gala)> head(gala)
Species Endemics Area Elevation Nearest Scruz AdjacentBaltra 58 23 25.09 346 0.6 0.6 1.84Bartolome 31 21 1.24 109 0.6 26.3 572.33Caldwell 3 3 0.21 114 2.8 58.7 0.78Champion 25 9 0.10 46 1.9 47.4 0.18Coamano 2 1 0.05 77 1.9 1.9 903.82Daphne.Major 18 11 0.34 119 8.0 8.0 1.84
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm()
> plot(Species ~ Elevation, data = gala)
●
●
●
●
●
●●
●●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
0 500 1000 15000
100
200
300
400
Elevation
Spe
cies
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): model construction
> mdl <- lm(Species ~ Elevation, data = gala)> coef(mdl)
(Intercept) Elevation11.3351132 0.2007922
> plot(Species ~ Elevation, data = gala)> abline(mdl, col = "blue") ●
●
●
●
●
●●
●●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
0 500 1000 15000
100
200
300
400
Elevation
Spe
cies
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): model information
> summary(mdl)
Call:lm(formula = Species ~ Elevation, data = gala)
Residuals:Min 1Q Median 3Q Max
-218.319 -30.721 -14.690 4.634 259.180
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.33511 19.20529 0.590 0.56Elevation 0.20079 0.03465 5.795 3.18e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 78.66 on 28 degrees of freedomMultiple R-squared: 0.5454, Adjusted R-squared: 0.5291F-statistic: 33.59 on 1 and 28 DF, p-value: 3.177e-06
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): plot(lm)
> par(mfrow = c(2, 2))> plot(mdl)
50 150 250 350
−20
00
200
Fitted values
Res
idua
ls
●●●
●●●●
●●●
●
●
●
●
●●
●●
●
●● ●
●
●
●
●
●
●●●
Residuals vs Fitted
SantaCruz
Fernandina
SantaMaria
●●
●●
● ● ●●
●●
●
●
●
●
●●
● ●
●
●● ●
●
●
●
●
●
●●●
−2 −1 0 1 2
−2
02
4
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
SantaCruz
Fernandina
SantaMaria
50 150 250 350
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●●
Scale−LocationSantaCruzFernandina
SantaMaria
0.0 0.1 0.2 0.3
−4
−2
02
4
Leverage
Sta
ndar
dize
d re
sidu
als
●●●●●●●
●●●
●
●
●
●
●●
● ●
●
●●●
●
●
●
●
●
●●●
Cook's distance
10.5
0.51
Residuals vs Leverage
Fernandina
SantaCruz
SantaMaria
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): Residuals
> resid(mdl)
Baltra Bartolome Caldwell Champion Coamano Daphne.Major-22.809212 -2.221462 -31.225423 4.428446 -24.796112 -17.229384
Daphne.Minor Darwin Eden Enderby Espanola Fernandina-6.008787 -35.068202 -17.591359 -31.823839 45.908033 -218.318650Gardner1 Gardner2 Genovesa Isabela Marchena Onslow36.826069 -51.914941 13.404680 -7.087387 -29.206835 -14.354918
Pinta Pinzon Las.Plazas Rabida SanCristobal SanSalvador-63.350647 4.702062 -18.209579 -15.025848 124.897677 43.747160SantaCruz SantaFe SantaMaria Seymour Tortuga Wolf259.180432 -1.340291 145.157883 3.148434 -32.682461 -41.135538
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): Predictions
> newdata <- gala[15:nrow(gala), ]> predict(mdl, newdata)
Genovesa Isabela Marchena Onslow Pinta Pinzon26.59532 354.08739 80.20684 16.35492 167.35065 103.29794
Las.Plazas Rabida SanCristobal SanSalvador SantaCruz SantaFe30.20958 85.02585 155.10232 193.25284 184.81957 63.34029
SantaMaria Seymour Tortuga Wolf139.84212 40.85157 48.68246 62.13554
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): Predictions, IC
> predict(mdl, newdata, level = 0.9, interval = "confidence")
fit lwr uprGenovesa 26.59532 -3.2897506 56.48039Isabela 354.08739 271.4761884 436.69858Marchena 80.20684 55.7314194 104.68225Onslow 16.35492 -15.3566679 48.06650Pinta 167.35065 133.0307302 201.67056Pinzon 103.29794 78.2982338 128.29764Las.Plazas 30.20958 0.9226656 59.49649Rabida 85.02585 60.5948667 109.45683SanCristobal 155.10232 123.2045765 187.00007SanSalvador 193.25284 153.2255598 233.28012SantaCruz 184.81957 146.7231442 222.91599SantaFe 63.34029 38.0783575 88.60222SantaMaria 139.84212 110.6221988 169.06203Seymour 40.85157 13.1644062 68.53873Tortuga 48.68246 21.9996242 75.36530Wolf 62.13554 36.7813407 87.48974
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): Model construction
> mdl <- lm(Species ~ Elevation + Endemics, data = gala)> summary(mdl)
Call:lm(formula = Species ~ Elevation + Endemics, data = gala)
Residuals:Min 1Q Median 3Q Max
-74.85 -12.49 2.59 12.67 70.25
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.92862 7.14320 -2.790 0.00955 **Elevation -0.02294 0.02009 -1.142 0.26366Endemics 4.35265 0.30997 14.042 6.29e-14 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27.8 on 27 degrees of freedomMultiple R-squared: 0.9452, Adjusted R-squared: 0.9412F-statistic: 233 on 2 and 27 DF, p-value: < 2.2e-16
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
lm(): Model construction
> mdl <- lm(Species ~ ., data = gala)> summary(mdl)
Call:lm(formula = Species ~ ., data = gala)
Residuals:Min 1Q Median 3Q Max
-68.219 -10.225 1.830 9.557 71.090
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.337942 9.423550 -1.628 0.117Endemics 4.393654 0.481203 9.131 4.13e-09 ***Area 0.013258 0.011403 1.163 0.257Elevation -0.047537 0.047596 -0.999 0.328Nearest -0.101460 0.500871 -0.203 0.841Scruz 0.008256 0.105884 0.078 0.939Adjacent 0.001811 0.011879 0.152 0.880---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 28.96 on 23 degrees of freedomMultiple R-squared: 0.9494, Adjusted R-squared: 0.9362F-statistic: 71.88 on 6 and 23 DF, p-value: 9.674e-14
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Other regressors
Partial Least Squares: pls() en library(pls)
> library(pls)> data(yarn)> mod <- plsr(density ~ NIR, ncomp = 10,+ data = yarn[yarn$train, ],+ validation = "CV")> predplot(mod, ncomp = 1:6)
●
●
●
●
●●
●●
●●
●●●
●●
●
●
●●●
●
0 20 40 60 80 100
020
4060
80
density, 1 comps, validation
●
●
●
●
●●
●●
●●
●●●●●
●
●●●●●
0 20 40 60 80 100
020
4060
80
density, 2 comps, validation
●
●●
●●●
●●●●
●●●●●
●
●●●●●
0 20 40 60 80 100
020
4060
8010
0
density, 3 comps, validation
●
●●
●●●
●●●●
●●●●●
●●●●●●
0 20 40 60 80 100
020
4060
8010
0density, 4 comps, validation
●
●●
●●●
●●●●
●●●●●
●●●●●●
0 20 40 60 80 100
020
4060
8010
0
density, 5 comps, validation
●
●●
●●●
●●●●
●●●●●
●●●●●●
0 20 40 60 80 100
020
4060
8010
0
density, 6 comps, validation
measured
pred
icte
d
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Other regressors
Principal Component Regression: pcr() en library(pls)
> data(yarn)> mod <- pcr(density ~ NIR, ncomp = 10,+ data = yarn[yarn$train, ],+ validation = "CV")> predplot(mod, ncomp = 1:6)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
0 20 40 60 80 100
2025
3035
4045
density, 1 comps, validation
●
●
●
●
●●
●●
●●
●●●●●
●
●●●●●
0 20 40 60 80 100
020
4060
80
density, 2 comps, validation
●
●●
●●●
●●
●●
●●●●●
●
●●●●●
0 20 40 60 80 100
020
4060
8010
0
density, 3 comps, validation
●
●●
●●●
●●●
●
●●●●●
●
●●●●●
0 20 40 60 80 100
020
4060
80density, 4 comps, validation
●
●●
●●●
●●●●
●●●●●
●
●●●●●
0 20 40 60 80 100
020
4060
8010
0
density, 5 comps, validation
●
●●
●●●
●●●●
●●●●●
●●●●●●
0 20 40 60 80 100
020
4060
8010
0
density, 6 comps, validation
measured
pred
icte
d
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Other regressors
Generally it is assumed:
ε is i.i.d (ind. and ident. distr., ε = σ2IResidues dist. normal
When errors are not i.i.d.:
glm() from library(stats)
glm(model, family="bionomial") (logistic version with)
Independent errors, but no iden. dist.:
WLS, through glm()
Errors not normally distributed:
robust regression, through rlm() from library(MASS)
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
One way anova
Generalization of t-test
H0: µ0 = µ1 = · · · = µp
> oneway.test(Sepal.Length ~ Species,+ data = iris)
One-way analysis of means (notassuming equal variances)
data: Sepal.Length and SpeciesF = 138.9083, num df = 2.000, denomdf = 92.211, p-value < 2.2e-16
p-value small: we reject nullhypothesis of equal means
●
setosa versicolor virginica
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
Species
Sep
al.L
engt
h
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Anova
> mdl <- lm(Sepal.Length ~ Species - 1, data = iris)> mdl.null <- lm(Sepal.Length ~ 1, data = iris)> anova(mdl, mdl.null)
Analysis of Variance Table
Model 1: Sepal.Length ~ Species - 1Model 2: Sepal.Length ~ 1
Res.Df RSS Df Sum of Sq F Pr(>F)1 147 38.9562 149 102.168 -2 -63.212 119.26 < 2.2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Principal Component Analysis: model
> mdl <- prcomp(iris[, -5], center = TRUE, scale = TRUE)> summary(mdl)
Importance of components:PC1 PC2 PC3 PC4
Standard deviation 1.71 0.956 0.3831 0.14393Proportion of Variance 0.73 0.229 0.0367 0.00518Cumulative Proportion 0.73 0.958 0.9948 1.00000
> mdl$rotation
PC1 PC2 PC3 PC4Sepal.Length 0.5210659 -0.37741762 0.7195664 0.2612863Sepal.Width -0.2693474 -0.92329566 -0.2443818 -0.1235096Petal.Length 0.5804131 -0.02449161 -0.1421264 -0.8014492Petal.Width 0.5648565 -0.06694199 -0.6342727 0.5235971
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Principal Component Analysis: prediction
> proj <- predict(mdl, iris[, -5])> plot(proj) ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−2
−1
01
2
PC1
PC
2
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
Principal Component Analysis: prediction
> biplot(mdl)
−0.2 −0.1 0.0 0.1 0.2
−0.
2−
0.1
0.0
0.1
0.2
PC1
PC
2
1
234
5
6
78
9
10
11
12
1314
15
16
17
18
1920
21
22
23
2425
26
272829
3031
32
33
34
3536
3738
39
4041
42
43
44
45
46
47
48
49
50
515253
54
55
56
57
58
59
60
61
62
63
6465
66
67
68
6970
71
72
73
74
7576
77
78
79
80
8182
8384
85
8687
88
89
9091
92
93
94
95
9697
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125126
127
128129
130131
132
133134
135
136137
138
139
140141142
143
144145
146
147
148
149
150
−10 −5 0 5 10
−10
−5
05
10
Sepal.Length
Sepal.Width
Petal.LengthPetal.Width
> library(pls)> scoreplot(mdl, col = as.numeric(iris$Species),+ pch = 16)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−2
−1
01
2
PC1 (73 %)
PC
2 (2
3 %
)
Alexandre Perera i Lluna, Introduction to R: Part III
StatisticsTests
Linear regression
Linear modelsRegression analysisMultivariate regressionVariance analysis
End Part III
Alexandre Perera i Lluna, Introduction to R: Part III