Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
purrr
DRAFT
DRAFT
https://jennybc.github.io/purrr-tutorial/index.html
these are not slides from a talk!
I refer to them before and during live coding while teaching STAT 545 and DSCI 523
don’t expect them to stand on their own
more material developing here:
what is purrr?
functional programming
blah blah blah
ok I admit it:
FP not actually front of mind when I use purrr
what does purrr help me do?
iterate in a data-structure-informed way
tolerate list-columns in data frames
with consistent UI across a large family of fxns
and return values that are ready for further computation
for every X
do Y
return combined results like Z
for every X
do Y
return combined results like Z
X and Z will make reference to actual R data structures
Y will be a function, possibly anonymous
like for i in 1 to n … but much higher level
iterate in a data-structure-informed way
for every GitHub username
do GET https://api.github.com/users/username
and give me HTTP responses in a list
https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
iterate in a data-structure-informed way
for every HTTP response
extract the “name” element
and give me a character vector
https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
iterate in a data-structure-informed way
for every HTTP response
extract the elements "login", "name", "id", "location"
and give me a data frame
https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
iterate in a data-structure-informed way
for every row in a data frame
create a MIME object
and give me a list
https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html
iterate in a data-structure-informed way
for every MIME object
send an email
and return send status as a list
https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html
iterate in data-structure-informed way
for every tuple (string, pos of substring starts, pos of substring ends)
extract the substrings
and give me a list of character vectors
https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html
inspectquerymodify
inspectstr() str(my_list, max.level = 1) str(my_list[[i]], list.len = 10) listviewer::jsonedit()
map(.x, .f, ...)
map(.x, .f, ...).x is a vector
“for every X” = for every element of .x
remember lists are vectors
remember data frames are lists
map(.x, .f, ...).f is a function
possibly specified with shortcuts
all shown in the worked examples
“do Y” = .f(.x[[i]], …)
“give me a Z”
map(.x, .f, …) can be thought of as map_list(.x, .f, …)
“give me a Z”
map_lgl(.x, .f, ...) map_chr(.x, .f, ...) map_int(.x, .f, ...) map_dbl(.x, .f, …) return an atomic vector of requested type
“give me a Z”
map_df(.x, .f, ..., .id = NULL) basically: map() then dplyr::bind_rows()
“give me a Z”
walk(.x, .f, …) can be thought of as map_nothing(.x, .f, …)
“for every X”
map2(.x, .y, .f, …) X = (element i of .x, element i of .y)
pmap(.l, .f, …) X = tuple of the i-th elements of the lists in .l
remember a data frame is a list!
how might you be such things today?
maybe you don’t, because you don’t know how 😔
for loops
apply(), [slvmt]apply(), split(), by()
the plyr package: [adl][adl_]ply()
with dplyr: df %>% group_by() %>% do()
this is not my first R rodeo
I have gone through intense, evangelical phases of iterating with base “apply” functions and plyr
I highly recommend you give purrr a try
relationship to base R approaches
there’s nothing you can do with purrr that you cannot do with base
specifically: map() is basically lapply()
main reasons to use purrr:
- shortcuts facilitate anonymous functions for .f
- greater encouragement for type-safety
- consistent API across large family of functions
tolerate list-columns in data frames
tidyverse lifestyle ~ work in a data frame when possible
what about stuff that can’t be stored as an atomic vector? - stick it in a list-column
but list-columns are awful! - get better at inspecting lists - get better at computing on lists
use purrr::map() and friends - probably inside dplyr::mutate()
tolerate list-columns in data frames
tidyverse lifestyle ~ work in a data frame when possible
ok there’s a whole section I want to write here, with more worked examples on the site, etc.
but that’s not happening this round
what follows are a few hints of the what I will say
every time someone asks:
how can I iterate over a list, but also access the index i or the list names at the same time?
they should probably be working inside a data frame, with a list column and a variable for i or the names
use tibble::enframe() on your vexing_list and have at it with mutate(new_var = map_*(vexing_list, f)) or map2() or pmap()
Great example is Gapminder
draw on
http://r4ds.had.co.nz/many-models.html
and
STAT 545 Gapminder materials (translate from plyr and dplyr)
natural to nest at country level and put data in list-column fit models, etc. by mutating the data list-column extract model summaries by mutating the fits w broom fxns
more far out example is
https://jennybc.github.io/purrr-tutorial/ex24_xml-wrangling.html
where I put XML nodesets in a data frame each row is one row of a Google Sheet I proceed to wrangle it on the way to get cell contents
also, just to be clear:
no one in their right mind enjoys having list-columns in a data frame
but the benefits often outweigh the costs especially if you have the right tools and a productive mindset
it’s always a temporary state goal is always to get back to something simpler
ok this is where things just peter out 😬
and we go back to live coding
My economic policy speech will be carried live at 12:15 P.M. Enjoy! Join me in Fayetteville, North Carolina tomorrow evening at 6pm. Tickets now available at: https://t.co/Z80d4MYIg8 The media is going crazy. They totally distort so many things on purpose. Crimea, nuclear, "the baby" and so much more. Very dishonest!
I see where Mayor Stephanie Rawlings-Blake of Baltimore is pushing Crooked hard. Look at the job she has done in Baltimore. She is a joke!
Bernie Sanders started off strong, but with the selection of Kaine for V.P., is ending really weak. So much for a movement! TOTAL DISRESPECT
Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT!
The Cruz-Kasich pact is under great strain. This joke of a deal is falling apart, not being honored and almost dead. Very dumb!
substring(text, first, last)
[[1]][1] -1
[[2]][1] -1
[[3]][1] 20
[[4]][1] 134
[[5]][1] 28 95
[[6]][1] 87 114
[[7]][1] 50 112 123
[[1]][1] -3
[[2]][1] -3
[[3]][1] 24
[[4]][1] 137
[[5]][1] 33 98
[[6]][1] 90 119
[[7]][1] 53 115 126
tweets match_first match_last
https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html
pmap(list(text = tweets, first = match_first, last = match_last), substring)