04a Functions for NAs Strings Etc

Embed Size (px)

Citation preview

  • 8/19/2019 04a Functions for NAs Strings Etc

    1/17

    Data Analysis & DataScience with R

    Functions for dealing with NAs,NULLs, dates, strings, regular

    expressions, etc.

    y !arin Fotache

    Al.I. Cuza University of Iași

    Faculty of Economics and Business Administration

    Department of Accounting, Information ystems andtatistics

  • 8/19/2019 04a Functions for NAs Strings Etc

    2/17

    R script associated with thispresentation

    !"a#functions#for#$As#strings#etc.%

    &ttp'(()drv.ms()E)m*)i

    http://1drv.ms/1E1m81ihttp://1drv.ms/1E1m81i

  • 8/19/2019 04a Functions for NAs Strings Etc

    3/17

    "e# sites with R tutorials forsyste$ functions

    &ttp'((+++.sr.&am.ac.u-(a/rs(%(r0function#list.&tmlort

    &ttp'((+++.sr.&am.ac.u-(a/rs(%(r0functio

    n#list.&tml

    http://www.sr.bham.ac.uk/~ajrs/R/r-function_list.htmlhttp://www.sr.bham.ac.uk/~ajrs/R/r-function_list.htmlhttp://www.sr.bham.ac.uk/~ajrs/R/r-function_list.htmlhttp://www.sr.bham.ac.uk/~ajrs/R/r-function_list.html

  • 8/19/2019 04a Functions for NAs Strings Etc

    4/17

    %wo $ain types of $issing alues

    $A'◦ Stands for Not Available

    ◦ Is the equivalent of NULL in relational databases

    ◦ When importing data from Excel, tabdelimited files, etc!, usuall"

    un#no$n values are represented b" NA%s

    ◦ Not to be confounded $ith &NA& string '$hich sometimes occurs

    $hen importing(

    ◦ )ain function* is.na()

    $U11'

    ◦ +ompletel" different from NULLs in relational databases

    ◦ Within a vector, an element can be NA but not NULL 'NULL is

    atomic( if used inside a vector, a NULL element simpl"

    dissapears

  • 8/19/2019 04a Functions for NAs Strings Etc

    5/17

    Function is.na

    Create a very simple vector'

     > y is.na(y)

    [1] FALSE FALSE FALSE TRUE

  • 8/19/2019 04a Functions for NAs Strings Etc

    6/17

    Function na.fail 

    Function na.fail c&ec-s if t&ere are NA values in a dataset2na.fail +ill generate an error if t&ere is at least one NA 

    +it&in one of t&e columns of t&e data set

    Data frames student'gi, patientdata, $pi,

    Fuel()ciency.new, %oyota*orolla do not containNA

     values

    > na.fail(st!"nt#$i)  na%" a$" sc&'las&i la*#ass"ss%"nt final#$a!"

    1++1 '"sc . asil" 23 S'cial /in" 0.++

    1++2 an's . A!iana 10 St!i1 F'at" *in" 0.1++3 4'5ac6 . 's"f 21 St!i2 E7c"l"nt 0.8

    1++ /a*a!a$ . 9aia 22 9"it /in" 0.++

    1++ ' . 'n 31 St!i1 Sla* :.++

  • 8/19/2019 04a Functions for NAs Strings Etc

    7/17

    Function na.fail +cont.

    Data frame leadership contains at least one $A value,so na.fail +ill generate an error'

    > na.fail(l"a!"s&i)

    E' in na.fail.!"falt(l"a!"s&i) ; %issin$5al"s in '*"ct

    > l"a!"s&i  %ana$" !at" c'nty $"n!" a$" =1 =2 =3= =

    1 1 2+1+1+2 US 9 32

    2 2 1001+2? US F 3 2

    3 3 10?1+1 U4 F 2 3 2

    2+++121+ U4 9 0 3 3

  • 8/19/2019 04a Functions for NAs Strings Etc

    8/17

    *hec2 NA3s for range ofele$ents

    Display, for eac& student, if t&e e0mail is missing ornot

    > &"a!(st!s2+1)

    > is.na(st!s2+1[,@"%ail@])

    Display only t&e students +it& misssing e0mail

    address'

    > st!s2+1[is.na(st!s2+1"%ail),]

    Display, for eac& oservation, if variales 4)'45 are$A in data frame leaders&ip'

    > is.na(l"a!"s&i[,:;1+])  =1 =2 =3 = =

    [1,] FALSE FALSE FALSE FALSE FALSE

    [2,] FALSE FALSE FALSE FALSE FALSE

    [3,] FALSE FALSE FALSE FALSE FALSE

  • 8/19/2019 04a Functions for NAs Strings Etc

    9/17

    *ounting NA3s

    Counting t&e numer if $A values +it&in anentire data frame is possile +it& function sumand t&e follo+ing synta6's%(is.na(t&".!ata.fa%"))

    7o+ many $A values are t&e in data frameleadership8

    > s%(is.na(l"a!"s&i))

    [1] 2

    7o+ many $A values are t&e in data framecomp8

    > s%(is.na(c'%))

  • 8/19/2019 04a Functions for NAs Strings Etc

    10/17

    Function complete.cases

    Display oservations(ro+s +&ic& &ave at least one$A

    > l"a!"s&i[Bc'%l"t".cas"s(l"a!"s&i),]

      %ana$" !at" c'nty $"n!" a$" =1 =2 =3= =

    2+++121+ U4 9 0 3 3 NA NA

    > c'%[Bc'%l"t".cas"s(c'%),]

    Counting &o+ many oservations(ro+s &ave atleast one $A and &o+ many &ave no $As 9arecomplete cases:

    > ta*l"(c'%l"t".cas"s(l"a!"s&i))

    FALSE TRUE1

  • 8/19/2019 04a Functions for NAs Strings Etc

    11/17

    *ounting4displaying NA3s foraria#les and ranges

    7o+ many students &ave no e0mail address 8◦ Using sum!!!

    > s%(is.na(st!s2+1"%ail))

    [1] 1:18

    ◦ !!!or table

    > ta*l"(is.na(st!s2+1"%ail))

    FALSE TRUE

    ?8 1:18

    Display, for eac& oservation, oservations in+&ic& at least one value of variales 5/657 is $Ain data frame leadership

    > l"a!"s&i[Bc'%l"t".cas"s(l"a!"s&i[:;1+]),]

      %ana$" !at" c'nty $"n!" a$" =1 =2 =3

  • 8/19/2019 04a Functions for NAs Strings Etc

    12/17

    NULLs

    Completely di;erent from dataases 9$A is prettyclose to t&e concept of $U11 in relational dataases

    5.Cit&.nll =)> ) ? " 5 @

  • 8/19/2019 04a Functions for NAs Strings Etc

    13/17

    NULLs +cont.> l"n$t&(5.Cit&.na)

    =)> @> l"n$t&(5.Cit&.nll)

    =)> 5

    > s%(5.Cit&.na)

    =)> $A> s%(5.Cit&.nll)

    =)> )

    In a data frame, a variale(column set to $U11 alsodissapears

    > na%"s(a!l2+13#st!)

    =)> $r $umeren atricol Email

    > a!l2+13#st!N na%"s(a!l2+13#st!)

    =)> $umeren atricol Email

  • 8/19/2019 04a Functions for NAs Strings Etc

    14/17

    Functions for $anaging stringaria#les

    Base %

    ◦ nchar ◦ substr 

    ◦ strsplit

    ◦ paste - paste. - sprintf 

    ac-age stringr◦ str/c'( string concatenation 0 paste'(

    ◦ str/length'( number of characters 0 nchar'(

    ◦ str/sub'( extracts substrings 0 substring'(

    ◦ str/dup'( duplicates characters 0 no equivalent

    ◦ str/trim'( removes leading and trailing $hitespace 0 noequivalent

    ◦ str/pad'( pads a string 0 no equivalent

    ◦ str/$rap'( $raps a string paragraph 0 str$rap'(

    ◦ str/trim'( trims a string 0 no equivalent

    ◦ $ord '( 1 extracts $ords from a string 0 no equivalent

  • 8/19/2019 04a Functions for NAs Strings Etc

    15/17

    So$e we# pages for processingstrings in R

    aston anc&ez 0 7andling androcessing trings in %

    &ttp'((gastonsanc&ez.com(7andling#and#r

    ocessing#trings#in#%.pdf  o&n yles

  • 8/19/2019 04a Functions for NAs Strings Etc

    16/17

    Regular expressions

    ital for te6t(string(+e searc&ing

    Implemented in almost every programminglanguage

    In J1 t&e asic mec&anist is rudimentar andased on operators' 1IKE, I1IKE, II1A% 3L

    % &as full support for regular e6pressions

    ◦ 2unctions in base 3*

    grep, grepl,

    su, gsu,

    rege6pr, grege6pr

    ◦ 2unctions in pac#age stringr *

    strdetect

    stre6tract, stre6tractall, strmatc&, strmatc&all

    strlocate, strlocateall

    strreplace, strreplaceall

    strsplit, strsplitM6ed

  • 8/19/2019 04a Functions for NAs Strings Etc

    17/17

    "e# sites4ideo8tutorials for regularexpression +generally and R speci9c

    Basics of regular e6pressions 9in generaland in %:

    &ttp'((+++.re6egg.com(rege604uic-start

    .&tml&ttp'((+++.r0loggers.com(regular0e6pres

    sion0and0associated0functions0in0r(

    &ttp'((+++.r0loggers.com(r0tal-0on0regular0e6pressions0rege6(

    %egular E6pressions

    &ttps'((+++.youtue.com(+atc&8vN$v7/O

    http://www.rexegg.com/regex-quickstart.htmlhttp://www.rexegg.com/regex-quickstart.htmlhttp://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/https://www.youtube.com/watch?v=NvHjYOilOf8https://www.youtube.com/watch?v=NvHjYOilOf8https://www.youtube.com/watch?v=NvHjYOilOf8http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/http://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.rexegg.com/regex-quickstart.htmlhttp://www.rexegg.com/regex-quickstart.html