R でデータを解析する ( )

  • Published on
    14-Feb-2017

  • View
    224

  • Download
    4

Embed Size (px)

Transcript

  • ?read.table read.table(file, header = FALSE, sep = "", quote = "\"",

    dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),row.names, col.names, as.is = !stringsAsFactors,na.strings = "NA", colClasses = NA, nrows = -1,skip = 0, check.names = TRUE, fill = !blank.lines.skip,strip.white = FALSE, blank.lines.skip = TRUE,comment.char = "#",allowEscapes = FALSE, flush = FALSE,stringsAsFactors = default.stringsAsFactors(),fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

    header: FALSE | TRUE ()sep: ""() | "," (CSV)fileEncoding: Shift-JIS | UTF-8 ()

  • BMIdata.txt DTDT

  • > DT DT # ...21 Jirou M 191.5 76.422 Tei M 178.5 75.323 Yumi F 155.6 54.324 Miki F 164.2 63.225 Sacho F 158.3 52.326 Taichi M 171.4 84.427 Ichiro M 191.5 76.428 Nobuo M 178.5 75.3

  • head > head(DT)

    V1 V2 V3 V41 Name Sex Height Weight2 Yuri F 155.6 54.33 Miwa F 164.2 63.24 Saki F 158.3 52.35 Taiki M 171.4 84.46 Tarou M 191.5 76.4

    (Name, Sex, ...) V1V4 header = TRUE

  • names str > names(DT)[1] "V1" "V2" "V3" "V4"> str(DT)data.frame: 28 obs. of 4 variables:$ V1: Factor w/ 19 levels "Aki","Daiki",..: 9 19 8 12 14 15 5 17 6 1 ...$ V2: Factor w/ 3 levels "F","M","Sex": 3 1 1 1 2 2 2 1 1 1 ...$ V3: Factor w/ 7 levels "155.6","158.3",..: 7 1 3 2 4 6 5 1 3 2 ...$ V4: Factor w/ 7 levels "52.3","54.3",..: 7 2 3 1 6 5 4 2 3 1 ...

    >

    names

    str < Factor Height, Weight

  • > DT names(DT)[1] "Name" "Sex" "Height" "Weight"> str(DT)data.frame: 27 obs. of 4 variables:$ Name : Factor w/ 18 levels "Aki","Daiki",..: 18 8 11 13 14$ Sex : Factor w/ 2 levels "F","M": 1 1 1 2 2 2 1 1 1 2 ...$ Height: num 156 164 158 171 192 ...$ Weight: num 54.3 63.2 52.3 84.4 76.4 75.3 54.3 63.2 52.3 ...

    str Factor num

  • CSV

    MS-Office Excel Shift-JIS

    LibreOffice Calc UTF-8

    Windows Shift-JIS

    Mac/Linux UTF-8

    # UTF-8 Shift-JIS : Score.csv incomplete final line found by readTableHeader on Score.csv

    # Shift-JIS UTF-8 : : Score2.csv incomplete final line found by readTableHeader on Score2.csv

  • 1

    ()

    t- (10 )

  • DataA.csv

    > DA head(DA)

    X32.25 # 1 34.272 34.41...> DA head(DA)

    V1 # 1 32.252 34.27

    DA V1

  • Shapiro-Wilk

    DT$V1

    > summary(DA$V1) # Min. 1st Qu. Median Mean 3rd Qu. Max.

    32.25 49.39 55.78 55.59 62.19 79.60> shapiro.test(DA$V1)

    Shapiro-Wilk normality testdata: DA$V1W = 0.99452, p-value = 0.6782

    W: p: p W p W p

  • lattice

    R lattice

    > install.packages("lattice")Installing package into ...# #

    install.packages() library() library(lattice)

  • (1)

    > summary(DA$V1) # Min. 1st Qu. Median Mean 3rd Qu. Max.

    32.25 49.39 55.78 55.59 62.19 79.60> library(lattice) # lattice > densityplot(DA$V1) #

    DT$V1

    Density

    0.00

    0.01

    0.02

    0.03

    0.04

    40 60 80

  • (2)

    Q-Q qqnorm lattice > qqnorm(DA$V1) # Q-Q > qqline(DA$V1) #

    -3 -2 -1 0 1 2 3

    40

    50

    60

    70

    80

    Normal Q-Q Plot

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    Q-Q

  • > histogram(DA$V1) # > bwplot(DA$V1) # Box-Whisker plot, Box Plot

    DA$V1

    Per

    cen

    t of

    Tot

    al

    0

    5

    10

    15

    20

    25

    30 40 50 60 70 80

    DA$V1

    40 50 60 70 80

  • DataAB.txt A, B lattice X Type43.69 A42.04 A39.85 A67.98 B58.00 B

    > DT str(DT)data.frame: 400 obs. of 2 variables:$ X : num 43.7 42 39.9 68 58 ...$ Type: Factor w/ 2 levels "A","B": 1 1 1 2 2 2 1 2 2 1 ...

  • 2 densityplot lattice () Type A, B 2 | ~ X X

    > library(lattice)> densityplot(~ X|Type, data=DT) # > densityplot(~ DT$X|DT$Type) #

    X

    Density

    0.00

    0.01

    0.02

    0.03

    0.04

    20 40 60 80 100

    A

    20 40 60 80 100

    B

    -3 -2 -1 0 1 2 3

    40

    50

    60

    70

    80

    A

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

    -3 -2 -1 0 1 2 3

    20

    40

    60

    80

    B

    Theoretical Quantiles

    Sam

    ple

    Qua

    ntile

    s

  • Q-Q (lattice )

    Q-Q qqnorm 2 par() 2 qqnorm() qqnorm()

    > par(mfrow=c(1,2))> XA XB qqnorm(XA,main="A") # "A"> qqline(XA)> qqnorm(XB,main="B") # "B"> qqline(XB)

    lattice qqmath()

    lattice qqmath() densityplot()

  • 2

    X

    Y

    y = a + b x

    (x1,y1)

    (x2,y2)

    (xi,yi)

    (xi,a+bxi)

    h1

    h2

    hi

    5 10 15 20

    10

    15

    20

    25

    30

    35

    xR

    yR

    hi a; b (ax + b )s2x x1; x2; : : : ; xn sxy x , y

    b =sxys2x; a = y ` bx

  • y = a + b1x1 + b2x2 + + bpxp

    b1; b2; : : :

    2

    6

    6

    6

    6

    6

    6

    4

    b1b2...bp

    3

    7

    7

    7

    7

    7

    7

    5

    =

    2

    6

    6

    6

    6

    6

    6

    4

    sx1x1 sx1x2 sx1xpsx2x1 sx2x2 sx2xp...

    .... . .

    ...sxpx1 sxpx2 sxpxp

    3

    7

    7

    7

    7

    7

    7

    5

    `1

    =

    2

    6

    6

    6

    6

    6

    6

    4

    sx1ysx2y...sxpy

    3

    7

    7

    7

    7

    7

    7

    5

  • R X Y

    11.04 21.0315.76 24.7517.72 31.289.15 11.1610.1 18.8912.33 24.254.2 10.5717.04 33.9910.5 21.018.36 9.68

    DT

  • result = lm(Y ~ X, data = DT)

    (linear model) resultlm Y ~ X, data = DT DT X Y

    summary(result)

    result

    plot(Y ~ X, data = DT)

    DT X Y

    abline(result)

  • Call:lm(formula = Y ~ X, data = DT)Residuals:

    Min 1Q Median 3Q Max-5.014 -2.754 1.221 2.372 3.491Coefficients:

    Estimate Std. Error t value Pr(>|t|)(Intercept) -0.6092 3.4405 -0.177 0.863859X 1.8305 0.2800 6.538 0.000181 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 .0.1 1Residual standard error: 3.54 on 8 degrees of freedomMultiple R-squared: 0.8424,Adjusted R-squared: 0.8227F-statistic: 42.75 on 1 and 8 DF, p-value: 0.000180

  • Residuals:

    Coefficients: Intercept , X Estimate Std. Error t value tPr(>|t|), p (p )

    Residual standard error:

    Multiple R-squared: () 2

    Adjusted R-squared: () 2 ()

    F-Statistic: F p

  • TestScore.txt

    Eng Math Sci Art84 58 87 4784 59 89 5486 59 90 5087 63 94 5583 60 88 51

    83 60 88 5084 60 90 5482 60 86 5082 60 88 5285 63 90 53 30

    Eng, Math, SciArt

  • Result

  • Eng

    56 57 58 59 60 61 62 63 50 55 60

    78

    80

    82

    84

    86

    88

    56

    57

    58

    59

    60

    61

    62

    63

    Math

    Sci

    84

    86

    88

    90

    92

    94

    78 80 82 84 86 88

    50

    55

    60

    84 86 88 90 92 94

    Art

    Eng

    56 57 58 59 60 61 62 63 50 55 60

    78

    80

    82

    84

    86

    88

    56

    57

    58

    59

    60

    61

    62

    63

    0.30

    Math

    0.62 0.58

    Sci

    84

    86

    88

    90

    92

    94

    78 80 82 84 86 88

    50

    55

    60

    0.46 0.62

    84 86 88 90 92 94

    0.69

    Art

    cor(Result) pairs(Result)

  • Result.fit |t|)(Intercept) -51.8052 19.2769 -2.687 0.0124 *Eng 0.1165 0.2475 0.471 0.6418Math 0.6130 0.2873 2.133 0.0425 *Sci 0.6383 0.2835 2.251 0.0330 *---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1Residual standard error: 2.385 on 26 degrees of freedomMultiple R-squared: 0.554,Adjusted R-squared: 0.5025F-statistic: 10.76 on 3 and 26 DF, p-value: 8.859e-05

    Math, Sci Eng

  • m0 m25 m50 m75 w0 w25 w50 w75

    Algeria 63 51 30 13 67 54 34 15Cameroon 34 29 13 5 38 32 17 6Madagascar 38 30 17 7 38 34 20 7Mauritius 59 42 20 6 64 46 25 8Reunion 56 38 18 7 62 46 25 10Seychelles 62 44 24 7 69 50 28 14

    South Africa(B) 50 39 20 7 55 43 23 8South Africa(W) 65 44 22 7 72 50 27 9

    Tunisia 56 46 24 11 63 54 33 19Canada 69 47 24 8 75 53 29 10Costa Rica 65 48 26 9 68 50 27 10

    Dominican Rep 64 50 28 11 66 51 29 11. . . . . . . . . . . . . . . . . . . . . . . . . . .

    Ecuador 57 46 28 9 60 49 28 11

    31 mxx, wxx xx ()

  • R ## life

  • Trin

    idad

    (62

    )C

    anad

    aU

    nite

    d S

    tate

    s (W

    66

    )A

    rgen

    tin

    aSo

    uth

    Afr

    ica(

    W)

    Uni

    ted

    Sta

    tes

    (66

    )U

    nite

    d S

    tate

    s (6

    7)

    Trin

    idad

    (67

    )U

    nit

    ed S

    tate

    s (N

    W6

    6)

    Ch

    ileSe

    ych

    elle

    sG

    rena

    da

    Jam

    aica

    Reu

    nion

    Mex

    ico

    Col

    om

    bia

    Hon

    dur

    asM

    auri

    tiu

    sG

    reen

    land A

    lger

    iaC

    ost

    a R

    ica

    Pan

    ama

    Dom

    inic

    an R

    epN

    icar

    agu

    aTu

    nisi

    aEl

    Sal

    vad

    orEc

    uad

    orSo

    uth

    Afr

    ica(

    B)

    Gua

    tem

    ala

    Cam

    ero

    onM

    adag

    asca

    r01

    02

    03

    04

    05

    06

    0

    hclust (*, "complete")

  • TibetScull.txt Type

    Length Breadth Height Fheight Fbreadth Type"1" 190.5 152.5 145 73.5 136.5 "1""2" 172.5 132 125.5 63 121 "1""3" 167 130 125.5 69.5 119.5 "1""4" 169.5 150.5 133.5 64.5 128 "1""5" 175 138.5 126 77.5 135.5 "1". . . . . . . . . . . . . . . . . . . . ."19" 179.5 135 128.5 74 132 "2""20" 191 140.5 140.5 72.5 131.5 "2""21" 184.5 141.5 134.5 76.5 141.5 "2". . . . . . . . . . . . . . . . . . . . ."31" 197 131.5 135 80.5 139 "2""32" 182.5 131 135 68.5 136 "2"

  • Length Breadth Height Fheight FbreadthA 171.0 140.5 127.0 69.5 137.0B 179.0 132.0 140.0 72.0 138.5

    library(MASS) # MASS DT

  • A, B 1 0.755,0.174

    $class[1] 1 2Levels: 1 2

    $posterior1 2

    A 0.7545066 0.2454934B 0.1741016 0.8258984

  • () Musicchoice.txt

    39 45 21 83 68 47 53 51 65 41 32 55

    ffl2

    MData

  • Pearsons Chi-squared testdata: MDataX-squared = 25.8888, df = 6, p-value = 0.0002335

    p =0.00024

    ffl2 25.9 (df) = 6 0.5%

  • The R Tips 2 R , 2009 ()R , 2009 ()A. R ABC ,2012B. R S-PLUS ,2012R 2010RjpWiki http://www.okadajp.org/RWiki/ R WikiThe R Project for Statistical Computing,https://www.r-project.org/ R

    * R

Recommended

View more >