purrr - files.speakerdeck.com€¦ · blah blah blah ok I admit it: FP not actually front of mind...

Preview:

Citation preview

purrr

DRAFT

DRAFT

https://jennybc.github.io/purrr-tutorial/index.html

these are not slides from a talk!

I refer to them before and during live coding while teaching STAT 545 and DSCI 523

don’t expect them to stand on their own

more material developing here:

what is purrr?

functional programming

blah blah blah

ok I admit it:

FP not actually front of mind when I use purrr

what does purrr help me do?

iterate in a data-structure-informed way

tolerate list-columns in data frames

with consistent UI across a large family of fxns

and return values that are ready for further computation

for every X

do Y

return combined results like Z

for every X

do Y

return combined results like Z

X and Z will make reference to actual R data structures

Y will be a function, possibly anonymous

like for i in 1 to n … but much higher level

iterate in a data-structure-informed way

for every GitHub username

do GET https://api.github.com/users/username

and give me HTTP responses in a list

https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html

iterate in a data-structure-informed way

for every HTTP response

extract the “name” element

and give me a character vector

https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html

iterate in a data-structure-informed way

for every HTTP response

extract the elements "login", "name", "id", "location"

and give me a data frame

https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html

iterate in a data-structure-informed way

for every row in a data frame

create a MIME object

and give me a list

https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html

iterate in a data-structure-informed way

for every MIME object

send an email

and return send status as a list

https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html

iterate in data-structure-informed way

for every tuple (string, pos of substring starts, pos of substring ends)

extract the substrings

and give me a list of character vectors

https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html

inspectquerymodify

inspectstr() str(my_list, max.level = 1) str(my_list[[i]], list.len = 10) listviewer::jsonedit()

map(.x, .f, ...)

map(.x, .f, ...).x is a vector

“for every X” = for every element of .x

remember lists are vectors

remember data frames are lists

map(.x, .f, ...).f is a function

possibly specified with shortcuts

all shown in the worked examples

“do Y” = .f(.x[[i]], …)

“give me a Z”

map(.x, .f, …) can be thought of as map_list(.x, .f, …)

“give me a Z”

map_lgl(.x, .f, ...) map_chr(.x, .f, ...) map_int(.x, .f, ...) map_dbl(.x, .f, …) return an atomic vector of requested type

“give me a Z”

map_df(.x, .f, ..., .id = NULL) basically: map() then dplyr::bind_rows()

“give me a Z”

walk(.x, .f, …) can be thought of as map_nothing(.x, .f, …)

“for every X”

map2(.x, .y, .f, …) X = (element i of .x, element i of .y)

pmap(.l, .f, …) X = tuple of the i-th elements of the lists in .l

remember a data frame is a list!

how might you be such things today?

maybe you don’t, because you don’t know how 😔

for loops

apply(), [slvmt]apply(), split(), by()

the plyr package: [adl][adl_]ply()

with dplyr: df %>% group_by() %>% do()

this is not my first R rodeo

I have gone through intense, evangelical phases of iterating with base “apply” functions and plyr

I highly recommend you give purrr a try

relationship to base R approaches

there’s nothing you can do with purrr that you cannot do with base

specifically: map() is basically lapply()

main reasons to use purrr:

- shortcuts facilitate anonymous functions for .f

- greater encouragement for type-safety

- consistent API across large family of functions

tolerate list-columns in data frames

tidyverse lifestyle ~ work in a data frame when possible

what about stuff that can’t be stored as an atomic vector? - stick it in a list-column

but list-columns are awful! - get better at inspecting lists - get better at computing on lists

use purrr::map() and friends - probably inside dplyr::mutate()

tolerate list-columns in data frames

tidyverse lifestyle ~ work in a data frame when possible

ok there’s a whole section I want to write here, with more worked examples on the site, etc.

but that’s not happening this round

what follows are a few hints of the what I will say

every time someone asks:

how can I iterate over a list, but also access the index i or the list names at the same time?

they should probably be working inside a data frame, with a list column and a variable for i or the names

use tibble::enframe() on your vexing_list and have at it with mutate(new_var = map_*(vexing_list, f)) or map2() or pmap()

Great example is Gapminder

draw on

http://r4ds.had.co.nz/many-models.html

and

STAT 545 Gapminder materials (translate from plyr and dplyr)

natural to nest at country level and put data in list-column fit models, etc. by mutating the data list-column extract model summaries by mutating the fits w broom fxns

more far out example is

https://jennybc.github.io/purrr-tutorial/ex24_xml-wrangling.html

where I put XML nodesets in a data frame each row is one row of a Google Sheet I proceed to wrangle it on the way to get cell contents

also, just to be clear:

no one in their right mind enjoys having list-columns in a data frame

but the benefits often outweigh the costs especially if you have the right tools and a productive mindset

it’s always a temporary state goal is always to get back to something simpler

ok this is where things just peter out 😬

and we go back to live coding

My economic policy speech will be carried live at 12:15 P.M. Enjoy! Join me in Fayetteville, North Carolina tomorrow evening at 6pm. Tickets now available at: https://t.co/Z80d4MYIg8 The media is going crazy. They totally distort so many things on purpose. Crimea, nuclear, "the baby" and so much more. Very dishonest!

I see where Mayor Stephanie Rawlings-Blake of Baltimore is pushing Crooked hard. Look at the job she has done in Baltimore. She is a joke!

Bernie Sanders started off strong, but with the selection of Kaine for V.P., is ending really weak. So much for a movement! TOTAL DISRESPECT

Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT!

The Cruz-Kasich pact is under great strain. This joke of a deal is falling apart, not being honored and almost dead. Very dumb!

substring(text, first, last)

[[1]][1] -1

[[2]][1] -1

[[3]][1] 20

[[4]][1] 134

[[5]][1] 28 95

[[6]][1] 87 114

[[7]][1] 50 112 123

[[1]][1] -3

[[2]][1] -3

[[3]][1] 24

[[4]][1] 137

[[5]][1] 33 98

[[6]][1] 90 119

[[7]][1] 53 115 126

tweets match_first match_last

https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html

pmap(list(text = tweets, first = match_first, last = match_last), substring)

Recommended