Upload
arnold
View
67
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Analysis Using R: 3. Graphical Analyses. Tuan V. Nguyen Garvan Institute of Medical Research, Sydney, Australia. Overview. Data Barchart Historgram Stripchart Boxplot Scatter plot. Data. Body composition data measured by dual energy X-ray absorptiometry - PowerPoint PPT Presentation
Citation preview
Data Analysis Using R:3. Graphical Analyses
Tuan V. Nguyen
Garvan Institute of Medical Research,
Sydney, Australia
Overview
• Data• Barchart • Historgram• Stripchart• Boxplot• Scatter plot
Data
• Body composition data measured by dual energy X-ray absorptiometry
• 43 men and women, aged between 11 and 28• Variable names:
– id – age– sex– dur– weight– height– lm (lean mass)– pclm (percent lean mass)– fm (fat mass)– pcfm (percent fat mass)– bmc (bone mineral contents)
Reading data into R
setwd(“c:/works/stats”)
bc <- read.table(“comp.txt”, header=T)
attach(bc)
names(bc)
[1] "id" "age" "sex" "dur" "weight" "height" "lm" "pclm"
[9] "fm" "pcfm" "bmc"
View data
bc
id age sex dur weight height lm pclm fm pcfm bmc1 1 15 M 5 39 148 32.96 84.50 4.86 12.5 1.332 2 16 M 8 45 162 38.16 84.80 4.15 9.2 1.893 3 11 M 4 23 132 18.51 80.50 2.99 13.0 0.744 4 19 M 9 46 159 35.92 78.10 6.73 14.6 1.595 5 19 M 6 56 166 46.63 83.00 5.61 10.2 2.566 6 22 M 12 50 152 42.13 84.00 3.93 8.1 2.127 7 16 M 8 53 170 45.23 85.00 5.15 9.8 2.218 8 12 M 5 35 151 25.26 72.20 9.02 25.6 0.959 9 21 M 8 46 166 39.44 85.70 4.64 10.1 2.0010 10 15 M 6 45 165 38.47 85.50 3.92 8.9 1.7011 11 13 M 5 32 142 25.50 79.70 4.26 13.9 0.9912 12 20 M 6 40 153 32.70 82.00 4.66 12.0 1.38...40 40 12 M 10 39 155 33.00 84.60 3.50 9.2 1.4341 41 15 M 6 45 154 36.00 80.00 5.33 12.5 1.5242 42 22 M 7 46 157 38.50 84.00 4.63 10.3 1.8643 43 25 M 13 45 162 37.35 83.00 4.34 10.0 1.70
Counting: barplot
freq <- table(sex)barplot(freq)barplot(freq, horiz=T, main="Sex distribution")
F M
05
1015
2025
30
FM
Sex distribution
0 5 10 15 20 25 30
Counting by group : barplot
agegroup <- cut(age, 3)agesex <- table(sex, agegroup)barplot(agesex)
(11,16.7] (16.7,22.3] (22.3,28]
05
10
15
20
25
Counting by group : barplot
agegroup <- cut(age, 3)agesex <- table(sex, agegroup)barplot(agesex, xlab="Age group")barplot(agesex, beside=T, xlab="Age group")
(11,16.7] (16.7,22.3] (22.3,28]
Age group
05
10
15
20
25
(11,16.7] (16.7,22.3] (22.3,28]
Age group
05
10
15
Distribution of data: Histogram
par(mfrow=c(2,2))hist(age)hist(age, breaks=20)hist(age, breaks=40)hist(age, breaks=50)
Histogram of age
age
Fre
qu
en
cy
10 15 20 25
02
46
81
0
Histogram of age
age
Fre
qu
en
cy
15 20 25
01
23
45
67
Histogram of age
age
Fre
qu
en
cy
15 20 25
01
23
45
67
Histogram of age
age
Fre
qu
en
cy
15 20 250
12
34
56
7
Distribution of data: Histogram
par(mfrow=c(2,2))hist(age)hist(weight)hist(lm)hist(fm)
Histogram of age
age
Fre
qu
en
cy
10 15 20 25
02
46
81
0
Histogram of weight
weight
Fre
qu
en
cy
20 30 40 50 60
05
10
15
Histogram of lm
lm
Fre
qu
en
cy
15 20 25 30 35 40 45 50
02
46
81
01
21
4
Histogram of fm
fm
Fre
qu
en
cy
2 4 6 8 10 12 14
05
10
15
Distribution of data: plot(density)
hist(lm, main="Distribution of lean mass")plot(density(lm), main="Distribution of lean mass")
Distribution of lean mass
lm
Fre
qu
en
cy
15 20 25 30 35 40 45 50
02
46
81
01
21
4
10 20 30 40 50
0.0
00
.01
0.0
20
.03
0.0
40
.05
Distribution of lean mass
N = 43 Bandwidth = 2.607
De
nsi
ty
Normal distribution? qqnorm
• qqnorm(lm)
-2 -1 0 1 2
20
25
30
35
40
45
Normal Q-Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntil
es
Contiunity of data: stripchart
stripchart(lm, xlab=“Lean mass; kg")
20 25 30 35 40 45
Lean mass; kg
?
Summary of continuous data: boxplot2
02
53
03
54
04
5
46
81
01
2
boxplot(fm)boxplot(lm)
LMMin. 1st Qu. Median Mean 3rd Qu. Max. 18.51 31.91 35.92 35.65 40.14 46.63
FMMin. 1st Qu. Median Mean 3rd Qu. Max. 2.990 4.250 5.270 6.500 8.795 12.800
Summary of data by group: boxplot
boxplot(fm ~ sex)boxplot(lm ~ sex)
F M
20
25
30
35
40
45
F M
46
81
01
2
Lean mass by sex Fat mass by sex
Analysis of association: scatter plot
plot(lm ~ age) plot(lm ~ age, pch=16)
15 20 25
20
25
30
35
40
45
age
lm
15 20 25
20
25
30
35
40
45
age
lm
Analysis of association: scatter plot
line <- lm(lm ~ age)
plot(lm ~ age, pch=16)
abline(line)
15 20 25
20
25
30
35
40
45
age
lm
Analysis of association by group: scatter plot
plot(lm ~ age, pch=ifelse(sex=="M", "M", "F"), xlab="Age", ylab="Kg")
M
M
M
M
M
M
M
M
MM
M
M
M
F
F
FF
F
F
M
M
FF
F
FF
M
M
M
F
M
M
M
F
M
M
MM
M
M
M
MM
15 20 25
20
25
30
35
40
45
Age
Kg
Analysis of multiple associations
data <- data.frame(age, weight, lm, fm, bmc)pairs(data)
age
25 35 45 55 4 6 8 10 12
15
20
25
25
35
45
55
weight
lm
20
30
40
46
81
01
2
fm
15 20 25 20 30 40 1.0 1.5 2.0 2.5
1.0
1.5
2.0
2.5
bmc
Analysis of multiple associations – more fancy graph
matrix.cor <- function(x, y, digits=2, prefix="", cex.cor){ usr <- par("usr"); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r <- abs(cor(x, y)) txt <- format(c(r, 0.123456789), digits=digits)[1] txt <- paste(prefix, txt, sep="") if(missing(cex.cor)) cex <- 0.8/strwidth(txt) test <- cor.test(x,y) # borrowed from printCoefmat Signif <- symnum(test$p.value, corr = FALSE, na = FALSE, cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), symbols = c("***", "**", "*", ".", " ")) text(0.5, 0.5, txt, cex = cex * r) text(.8, .8, Signif, cex=cex, col=2)}
pairs(data,lower.panel=panel.smooth, upper.panel=matrix.cor)
Results
age
25 35 45 55
0.48**
0.36*
4 6 8 10 12
0 .0 9 5
15
20
25
0.56***
25
35
45
55
weight 0.88***
0.11
0.85
***
lm 0.36*
20
30
40
0.86***
46
81
01
2
fm 0.16
15 20 25 20 30 40 1.0 1.5 2.0 2.5
1.0
1.5
2.0
2.5
bmc
Summary
• R is a very powerful package for graphical analysis
• First step in data analysis: graphical analysis
• Look for – Distributions– Differences– Associations