Descriptive Statistics Using SAS

  • Upload
    sxurdc

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

  • 7/25/2019 Descriptive Statistics Using SAS

    1/10

    Descriptive Statistics using SAS

    PROC MEANS

    See www.stattutorials.com/SASDATA for les mentioned in this tutorial TexaSoft, 2007

    TheseSAS statistics tutorials briefly explain the use and interpretation of standard statisticalanalysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research The

    examples include ho!"to instructions for SAS Soft!are

    Preliminary information about PROCMEANS

    PROC MEANS produces descriptie statistics !means, standard

    deiation, minimum,

    maximum, etc." for numeric aria#les in a set of data. $%&' ()A*Scan #e used for

    Descri#in+ continuous data where the aera+e has meanin+

    Descri#in+ the means across +roups

    Searchin+ for possi#le outliers or incorrectl coded alues

    $erformin+ a sin+le sample t-test

    The syntax of the $%&' ()A*S statement is

    PROC MEANS ; ;

    Statistical options that ma #e reuested are !default statistics areunderlined."

    N - Number of observations

    NMISS - Number of missing observations

    MEAN - Arithmetic average)

    STD - Standard Deviation

    MIN - Minimum (smallest)

    MAX - Maimum (largest) !AN"E - !ange

    S#M - Sum of observations

    $A! - $ariance

    #SS % #ncorr& sum of s'uares

    SS - orr& sum of s'uares

    STDE!! - Standard Error

    T - Students t value for testing *o+ d,

    .!T - .-value associated /ith t-test above

    S#M0"T - Sum of the 0EI"*T variable

    (Ne/ to version 1&)

    MEDIAN % 2th3ercentile

    .4 % 4st3ercentile

    .2 - 2th3ercentile

    .4 % 4th3ercentile

    .5 - 5th3ercentile

    .52 % 52th3ercentile

    .55 - 55th3ercentile

    64 - 4st'uartile

    67 - 7rd'uartile

    6!AN"E % 6uartile range

  • 7/25/2019 Descriptive Statistics Using SAS

    2/10

    values

    &ther commonl used options aaila#le in $%&' ()A*S include

    DATA Specif data set to use NOPR!NT Do not print output MA"DECn se n decimal places to print output

    'ommonl used statements with $%&' ()A*S include

    #$ variable list -- Statistics are reported for +roups in separateta#les

    C%ASS variable list 1 Statistics reported # +roups in a sin+leta#le

    &AR variable list 1 species which numeric aria#les to use O'TP'T O'T (atasetname 1 statistics will #e output to a

    SAS data le )RE* variable + species a aria#le that represents a count ofo#serations

    A few quick examples of PROC MEANS

    8 Sim3lest invocation % on all numeric variables 89.!: MEANS98S3ecified statistics and variables 89.!: MEANS N MEAN STD9 $A! S:DI#M A!;:98 Subgrou3 descri3tive statistics using b< statement89.!: S:!T9 ;= SEX9.!: MEANS9 ;= SEX9$A! >AT .!:TEIN S:DI#M98 Subgrou3 descri3tive statistics using class statement89.!: MEANS9 ?ASS SEX9

    $A! >AT .!:TEIN S:DI#M

    Example ,- A simple use of PROC MEANS

    This example calculates the means of seeral specied aria#les,limitin+ the output to

    two decimal places. !PROCMEANS,.SAS"

    8888888888888888888888888888888888888888888888888888888888888888 Data on /eight@ height@ and age of a random sam3le of 4 88 nutritionall< deficient children 88888888888888888888888888888888888888888888888888888888888888889

  • 7/25/2019 Descriptive Statistics Using SAS

    3/10

    DATA *I?D!EN9IN.#T 0EI"*T *EI"*T A"E9DATA?INES9BC 2 14 25 427 C5 BB B 4422 24 121 2 1 22 42 C1 52B C 424 C BB B4 4B1 2 59:DS !T>9

    proc means9Title Eam3le 4a - .!: MEANS@ sim3lest use9run9

    proc means madec,29var 0EI"*T *EI"*T9Title Eam3le 4b - .!: MEANS@ limit decimals@ s3ecif ?:SE9

    Output for Example ,-

    Example 1a - PROC MEANS, simplest use

    Variable N Mean Std Dev Minimum Maximum

    #$%&'T'$%&'T

    A&$

    ()()

    ()

    *)+-----)+-----

    ./(****+

    ./.*(--0*.)0-..0

    (.1(/0

    (-------0)-------

    *-------

    ++-------*)-------

    ()-------

    Example 1b - PROC MEANS, limit decimals, specifyvariables

    Variable N Mean Std Dev Minimum Maximum

    #$%&'T'$%&'T

    ()()

    *)+)+

    .//*.)

    (--0)--

    ++--*)--

    Example 1c PROC MEANS, specify statistics t reprt

  • 7/25/2019 Descriptive Statistics Using SAS

    4/10

    Variable N Mean Std Error Median

    #$%&'T

    '$%&'T

    ()

    ()

    *)+

    )+

    )/

    (/+

    *(--

    1--

    Example /- 'sing PROC MEANS using 0#y1roup2 an( Class statementsThis example uses $%&' ()A*S to calculate means for an entire dataset or # a+roupin+ aria#les. !PROCMEANS/.SAS"

    8888888888888888888888888888888888888888888888888888 Eam3le for .!: MEANS 8

    8888888888888888888888888888888888888888888888888889DATA >E!TI?IFE!9IN.#T >EEDT=.E 0EI"*T"AIN9DATA?INES94 CB&4 22&B4 27&74 CC&14 22&C4 2B&4 C1&5 24&7 2&C 2C&B 2&

    BC&7 22&9:DS !T>9PROC SORT DATA,>E!TI?IFE!9;= >EEDT=.E9PROC MEANS9 $A! 0EI"*T"AIN9 ;= >EEDT=.E9TIT?E Summar< statistics b< grou39!#N9PROC MEANS9 $A! 0EI"*T"AIN9 ?ASS >EEDT=.E9TIT?E Summar< statistics #SIN" ?ASS9!#N9:DS !T> ?:SE9

    &utput for this SAS code is

    Summary Statistics by !rup

    FEEDTYPE=1

  • 7/25/2019 Descriptive Statistics Using SAS

    5/10

    Analysis Variable !E"#$T#A"N

    N Mean Std Dev Minimum Maximum

    + (0+(0)/ 0+0+.-. 00.------ *-------

    FEEDTYPE=%

    Analysis Variable !E"#$T#A"N

    N Mean Std Dev Minimum Maximum

    * 0/*****+ 0+/000() (1------ *01------

    3n this rst ersion of the output the 45 statement !alon+ with the$%&' S&%T" createstwo ta#les, one for each alue of the 45 aria#le. 3n this next example,the '6ASSstatement produces a sin+le ta#le #roen down # +roup !8))DT5$)."

    Summary statistics "S#N! C$ASS

    Analysis Variable !E"#$T#A"N

    FEEDTYPE

    N

    &bs N Mean Std Dev Minimum Maximum

    ( + + (0+(0)/ 0+0+.-. 00.------ *-------

    ) * * 0/*****+ 0+/000() (1------ *01------

    3an(s on Exercise-

    9. (odif the a#oe pro+ram to output the followin+ statistics* ()A* ()D3A* (3* (A:2. se (A:D)';2 to limit num#er of decimals in output.

    E"AMP%E 4- 'sing PROC MEANS to 5n(O'T%!ERS

  • 7/25/2019 Descriptive Statistics Using SAS

    6/10

    $%&' ()A*S is a uic wa to nd lar+e or small alues in our dataset that ma #econsidered outliers !see $%&' *3

  • 7/25/2019 Descriptive Statistics Using SAS

    7/10

    perform a sin+le sample t-test. 8or example, suppose our datacontained the aria#lesB4)8&%) and BA8T)%, !#efore and after wei+ht on a diet", for ?su#@ects. To performa paired t-test usin+ $%&' ()A*S, follow these steps

    ( %ead in our data.) 'alculate the dierence #etween the two o#serations !B6&SS is

    the amount of wei+ht lost", and1 %eport the mean loss, t-statistic and p-alue usin+ $%&' ()A*S.

    The hpotheses for this test are

    Co 6oss ; 0 !The aera+e wei+ht loss was 0"

    Ca 6oss E 0 !The wei+ht loss was dierent than 0"

    8or example, the followin+ code performs a paired t-test for wei+ht lossdata

    !PROCMEANS6.SAS"

    DATA 0EI"*T9IN.#T 0;E>:!E 0A>TE!98 alculate 0?:SS in the DATA ste3 890?:SS,0A>TE!-0;E>:!E9DATA?INES9 4542 42C411 4B451 45745 45174 CC2 C

    419:DS !T>9PROC MEANS N MEAN T .!T9 $A! 0?:SS9TIT?E .aired t-test eam3le using .!: MEANS9RUN9:DS !T> ?:SE9

  • 7/25/2019 Descriptive Statistics Using SAS

    8/10

    *otice that the actual test is performed on the new aria#le calledB6&SS, and that iswh it is the onl aria#le reuested in the $%&' ()A*S statement.This is essentialla one-sample t-test. The statistics of interest are the mean of B6&SS,

    the t-statisticassociated with the null hpothesis for B6&SS and the p-alue. TheSAS output is asfollows

    Paired t-test example usi%& PROC MEANS

    Analysis Variable !'&SS

    N Mean t Value Pr ( )t)

    . "))+----- ")+/ --)+-

    The mean of the aria#le B6&SS is 122.7F. The t-statistic associatedwith the nullhpothesis is 12.7>, and the p-alue for this paired t-test is p ; 0.027,which proideseidence to re@ect the null hpothesis.

    E"AMP%E 9- 'sing PROC MEANS to outputstatistics 7a(vance(8

    Suppose ou hae a data set and ou want to add a column containin+a G-statistic #asedon the mean and standard deiation of a aria#le. Cere is one wa todo that.The followin+ data set contains wei+hts of 92 children. 5ou want to adda column of thedierence of the scores from the mean #ased on a the information inthe B)3HCTaria#le. 8or +ood measure also calculate the G-score.

    DATA 0T9

    IN.#T 0EI"*T9DATA?INES9BC427B2221

  • 7/25/2019 Descriptive Statistics Using SAS

    9/10

    22B24BB19PROC MEANS N:.!INT DATA,0T9$A! 0EI"*T9:#T.#T :#T,0TMEANSMEAN,0TMEAN STDDE$,0TSD9RUN9DATA 0TDI>>9SET 0T9I> HNH,1 T*EN SET 0TMEANS9DI>>,0EI"*T-0TMEAN9F,DI>>0TSD9 8 !EATES STANDA!DIFED S:!E (F-S:!E)9RUN9:DS !T>9PROC PRINT DATA, 0TDI>>9$A! 0EI"*T DI>> F9RUN9

    :DS !T> ?:SE

    The statement

    &T$T &T;BT()A*S ()A*;BT()A* STDD)

  • 7/25/2019 Descriptive Statistics Using SAS

    10/10

    &bs !E"#$T D"FF *

    1 *0 () -(1/(-

    % +( .) -/(.-.

    + 1 "/+ "(-.-(

    , *+ 0) -0+)/

    - "++ "-.*)00

    . . "0+ "-)./

    / ++ (0) (.+.

    0 + "+ "-*1/..

    * "*+ "-+((*

    12 ( "((+ "(1-++

    11 +* (1) (0+0-

    1% *. ) -.0)0

    *&T) 5ou could also +et standardiGed alues usin+ $%&' STA*DA%D.

    PROC STANDARD DATA,0TMEAN, STD,1 :#T,FS:!ES9$A! 0EI"*T9RUN9PROC PRINT DATA,FS:!ES9RUN9

    E%d f tutrial

    See http//www.stattutorials.com/SAS