42
TrelliscopeJS Hafen Consulting, LLC Purdue University @hafenstats Ryan Hafen http://bit.ly/trelliscopejs1 Modern Approaches to Data Exploration with Trellis Display

TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

TrelliscopeJS

HafenConsulting,LLC

PurdueUniversity

@hafenstats

RyanHafen

http://bit.ly/trelliscopejs1

ModernApproachestoDataExploration

withTrellisDisplay

Page 2: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

install.packages(c("tidyverse", "gapminder", "rbokeh","visNetwork", "plotly"))devtools::install_github("hafen/trelliscopejs")

library(tidyverse)library(gapminder)library(rbokeh)library(visNetwork)library(trelliscopejs)

Allexamplesinthistalkare

reproducibleafterinstallingand

loadingthefollowingpackages:

Page 3: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

TrelliscopeJSisanhtmlwidget

TrelliscopeJSisalayoutenginefor

collectionsofhtmlwidgets

TrelliscopeJSisaframeworkforcreating

interactivedisplaysofsmallmultiples

Page 4: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

SmallMultiples

Aseriesofsimilarplots,usually

eachbasedonadifferentsliceof

data,arrangedinagrid

"Forawiderangeofproblemsin

datapresentation,smallmultiples

arethebestdesignsolution."EdwardTufte(Envisioning

Information)

Thisideawasformalizedand

popularizedinS/S-PLUSand

subsequentlyRwiththetrellisand

latticepackages

Page 5: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

AdvantagesofSmallMultipleDisplays

source:

Avoidoverplotting

Workwithbigorhighdimensionaldata

Itisoftencriticaltothediscoveryofanew

insighttobeabletoseemultiplethingsat

once

Ourbrainsaregoodatperceiving

simplevisualfeatureslikecoloror

shapeorsizeandtheydoitamazingly

fastwithoutanyconsciouseffort

Wecantellimmediatelywhenapartof

animageisdifferentfromtherest,

withoutreallyhavingtofocusonit

Inmyexperience,smallmultiplesaremuchmoreeffective

thanmoreflashythingslikeanimation,linkedbrushing,

custominteractivevis,etc.

Page 6: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

Trelliscope:Interactive

SmallMultipleDisplay

source:

Smallmultipledisplaysareusefulwhenvisualizingdataindetail

Butthenumberofpanelsinadisplaycanbepotentiallyverylarge,

toolargetoviewallatonce

Itcanalsobedifficulttospecifyameaningfulorderinwhichpanels

aredisplayed

Trelliscopeisageneralsolutionthatallowssmall

multipledisplaystocomealivebyprovidingthe

abilitytointeractivelysortandfilterthepanels

basedonsummarystatistics,cognostics,

automaticallycomputedforeachpanel

Page 7: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

TrelliscopeJS

JavaScriptLibrary RPackage

trelliscopejs-lib trelliscopejs

BuiltusingReact

PureJavaScript

Interfaceagnostic

htmlwidgetinterfaceto

trelliscopejs-lib

EvolvedfromCRAN"trelliscope"

package(partof project)DeltaRho

Page 8: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

GapminderExample

Supposewewanttounderstandmortalityovertimeforeachcountry

bservations: ,ariables: country fctr fghanistan, fghanistan, fghanistan, fghanistan, fgh... continent fctr sia, sia, sia, sia, sia, sia, sia, sia, sia, s... year int , , , , , , , , , ... life p dbl . , . , . , . , . , . , . , ... pop int , , , , , ,... gdp ercap dbl . , . , . , . , . , . ...

glimpse(gapminder) https://www.gapminder.org/

Page 9: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

plot(year, life p, data gapminder, color country, geom "line")

Yikes!Therearealotofcountries...

Page 10: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

plot(year, life p, data gapminder, color continent,group country, geom "line")

Ican'tseewhat'sgoingon...

Page 11: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

plot(year, life p, data gapminder, color continent,group country, geom "line") facet_wrap( continent, nrow )

Thathelpedalittle...

Page 12: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

`r

p plot(year, life p, data gapminder, color continent,group country, geom "line") facet_wrap( continent, nrow )

plotly::ggplotly(p)

Thishelpsbutthereisstilltoomuchoverplotting...

(andhoveringforadditionalinfoistoomuchworkandwecanonlyseemoreinfooneatatime)

Page 13: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

plot(year, life p, data gapminder) lim( , ) ylim( , ) theme_bw() facet_wrap( country continent)

Page 14: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

Fromggplot2FacetingtoTrelliscope

Turningaggplot2faceteddisplayintoaTrelliscopedisplayis aseasyaschanging:

facet_wrap()

or:

facet_grid()

to:

facet_trelliscope()

Page 15: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

plot(year, life p, data gapminder) lim( , ) ylim( , ) theme_bw() facet_trelliscope( country continent, nrow = 2, ncol = 7, width = 300)

openinnew

window

Page 16: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

plot(year, life p, data gapminder) lim( , ) ylim( , ) theme_bw() facet_trelliscope( country continent,nrow , ncol , width , s plotl = )

openinnew

window

Page 17: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

Plottinginthe

Tidyverse

Page 18: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

country_model function(df)lm(life p year, data df)

by_country gapminder group_by(country, continent) nest() mutate(model map(data, country_model),resid_mad map_dbl(model, function( ) mad(resid( ))))

by_country

Exampleadaptedfrom"RforDataScience"

tibble: country continent data model resid_mad fctr fctr list list dbl fghanistan sia tibble : lm . lbania urope tibble : lm . lgeria frica tibble : lm . ngola frica tibble : lm . rgentina mericas tibble : lm . ustralia ceania tibble : lm . ustria urope tibble : lm . ahrain sia tibble : lm . angladesh sia tibble : lm . elgium urope tibble : lm .

... with more rows

GapminderExamplefrom"RforDataScience"

Onerowpergroup

Per-groupdataand

modelsas"list-columns"

Page 19: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

Excerptfrom"RforDataScience"

PlottingtheFitforEachCountry

Page 20: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

figure( lim c( , ),ylim c( , ), tools N ) ly_points(year, life p, data data, hover data) ly_abline(model)

country_plot(by_country data ,by_country model )

PlottingtheDataandModelFitforaGroup

We'llusetherbokehpackageto makeaplotfunctionandapplyit tothefirstrowofourdata

country_plot function(data, model)

Page 21: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

by_country by_country mutate(plot p2 plot(data, model, country_plot))

by_country

tibble: country continent data model resid_mad plot fctr fctr list list dbl list fghanistan sia tibble : lm . : rbokeh lbania urope tibble : lm . : rbokeh lgeria frica tibble : lm . : rbokeh ngola frica tibble : lm . : rbokeh rgentina mericas tibble : lm . : rbokeh ustralia ceania tibble : lm . : rbokeh ustria urope tibble : lm . : rbokeh ahrain sia tibble : lm . : rbokeh angladesh sia tibble : lm . : rbokeh elgium urope tibble : lm . : rbokeh ... with more rows

Let'sApplyThisFunctiontoEveryRow!

Plotsaslist-columns!!!

Page 22: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

by_country trelliscope(name "by_country_lm", nrow , ncol )

openinnew

window

Page 23: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

Recap:TrelliscopeJSintheTidyverse

Createadataframewithonerowpergroup,typicallyusing

Tidyversegroup_by()andnest()operations

Addacolumnofplots

TrelliscopeJSprovidespurrrmapfunctionsmap_plot(),

map2_plot(),pmap_plot()thatyoucanusetocreatethese

Youcanuseanygraphicssystemtocreatetheplotobjects

(ggplot2,htmlwidgets,lattice)

Optionallyaddmorecolumnstothedataframethatwillbeused

ascognostics-metricswithwhichyoucaninteractwiththepanels

Allatomiccolumnswillbeautomaticallyusedascognostics

Mapfunctionsmap_cog(),map2_cog(),pmap_cog()canbe

usedforconveniencetocreatecolumnsofcognostics

Simplypassthedataframeintotrelliscope()

Withplotsascolumns,TrelliscopeJSprovidesnearlyeffortless

detailed,flexible,interactivevisualizationintheTidyverse

Page 24: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

by_country arrange( resid_mad) trelliscope(name "by_country_lm", nrow , ncol )

openinnew

window

Orderthedataframetoset

initialorderingofdisplay

Page 25: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

by_country filter(continent " frica") trelliscope(name "by_country_africa_lm", nrow , ncol )

openinnew

window

Filterthedatatoonlyinclude

plotsyouwantinthedisplay

Page 26: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

ImagesasPanels

Page 27: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

pokemon read_csv("http://bit.ly/plot_pokemon") mutate_at(vars(matches("_id ")), as.character) mutate(panel img_panel(url_image))

pokemon

Show 10 entries Search:

Showing 1 to 10 of 801 entries Previous 1 2 3 4 5 … 81 Next

pokemon id species_id height weight base_experience type_1 type_2 attack

1 bulbasaur 1 1 7 69 64 grass poison 49

2 ivysaur 2 2 10 130 142 grass poison 62

3 venusaur 3 3 20 1000 236 grass poison 82

4venusaur-mega

4 3 24 1555 281 grass poison 100

5 charmander 5 4 6 85 62 fire 52

6 charmeleon 6 5 11 190 142 fire 64

7 charizard 7 6 17 905 240 fire flying 84

8charizard-mega-x

8 6 17 1105 285 fire dragon 130

9charizard-mega-y

9 6 17 1005 285 fire flying 104

10 squirtle 10 7 5 90 63 water 48

Page 28: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

trelliscope(pokemon, name "pokemon", nrow , ncol ,state list(labels c("pokemon", "pokede ")))

datasource blogpost openinnew

window

Page 29: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

htmlwidgetsas

Panels

Page 30: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

1

Example:NetworkViswithvisNetworkhtmlwidget

library(visNetwork)nnodes nnedges nodes data.frame( id :nnodes, label :nnodes, value rep( , nnodes))

edges data.frame( from sample( :nnodes, nnedges, replace ), to sample( :nnodes, nnedges, replace )) group_by(from, to) summarise(value n())

network_plot function(id, hide_select ) style ifelse(hide_select, "visibility: hidden position: absolute", "")

visNetwork(nodes, edges) vis graph ayout(layout "layout_in_circle") visNodes(fi ed , scaling list(min , ma , label list(min , ma , draw hreshold , ma isible ))) vis dges(scaling list(min , ma )) vis ptions(highlightNearest list(enabled , degree , hide olor "rgba( , , , . )"), nodes d election list(selected as.character(id), style style))

network_plot( , hide_select )

Page 31: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

nodedat edges group_by(from) summarise(n_nodes n(), tot_conns sum(value)) rename(id from) arrange( n_nodes) mutate(panel map_plot(id, network_plot))

nodedat

tibble: id n_nodes tot_conns panel int int int list : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork : visNetwork ... with more rows

Trelliscopedisplaywithonepanelpernode

Wecreateaone-row-per-nodedataframewithnumberof

nodesconnectedtoandtotalnumberofconnectionsas

cognosticsandaddaplotpanelcolumn

Page 32: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

nodedat arrange( n_nodes) trelliscope(name "connections", nrow , ncol )

openinnew

window

Page 33: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

LargerTrelliscope

Displays

Page 34: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and
Page 35: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

instadf %>%arrange(-likes_count) %>%trelliscope(name = "posts", width = 320, height = 320, nrow = 3, ncol = 6,state = list(labels = c("caption", "post_link", "likes_count")))

openinnew

window

blogpost

Page 36: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

TrelliscopeDisplays

asApps

Page 37: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

TrelliscopeDisplaysasApps

Ifyouhaveanappthathasmultipleinputsandproducesa

plotoutput,theideaissimplytoenumerateallpossible

inputsasrowsofadataframeandaddtheplotthat

correspondstotheseparametersascolumnandplotit

Trelliscopedisplaysaremostusefulasexploratoryplotsto

guidethedatascientist(becausetheycanbecreatedrapidly)

However,inmanycasesTrelliscopedisplayscanbeusedas

interactiveapplicationsforend-users,domainexperts,etc.

withthebonusthattheyaremucheasiertocreatethana

customapp

Page 38: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

GampinderLifeExpectancy

Selectcountry:

Afghanistan

library(shiny)library(ggplot2)library(gapminder)server <- function(input, output) {

output$countryPlot <- renderPlot({ qplot(year, lifeExp, data = subset(gapminder, country == input$country)) + xlim(1948, 2011) + ylim(10, 95) + theme_bw() })}

choices <- sort(unique(gapminder$country))

ui <- fluidPage( titlePanel("Gampinder Life Expectancy"), sidebarLayout( sidebarPanel( selectInput("country", label = "Select country: ", choices = choices, selected = "Afghanistan") ), mainPanel( plotOutput("countryPlot", height = "500px") ) ))

runApp(list(ui = ui, server = server))

Page 39: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

ScalingTrelliscope

Justbecauseyoucan'tlookatallpanelsinadisplaydoesn't

meanitisn'tusefulorpracticaltomakealargedisplay-it'sin

factbeneficialbecauseyougetanunprecedentedlevelof

detailinyourdisplays,andeverycornerofyourdatacanbe

conceptuallyviewed

Oneinsightisallyouneedforadisplayto

serveapurpose(provideditisquicktocreate)

Weusedthepreviousimplementationof

Trelliscopetovisualizemillionsofsubsets

ofterabytesofdata

Page 40: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

WhatisneededtoscaleintheTidyverse?

SparklyRisthenaturalsolution

Butweneedafewthings...

SparklyRsupportforlist-columns(nesteddataframes

andarbitraryRobjects)

SparklyRsupportforremoteprocedurecalls(run

arbitraryRcodeonthedata)

FastrandomaccesstorowsofaSparklyRdataframe

ATrelliscopeJSdeferredpanelrenderingscheme

(renderon-the-flyratherthanallpanelsupfront)

Page 41: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

What'sNext

trelliscopejs

Automaticcognostics:automaticallycomputeusefulcognostics

basedonthecontextofwhatisbeingplotted(e.g.ifa

scatterplothasamodelfitsuperposed,addmodeldiagnostics

cognostics

Automatichandlingofaxislimits-"same","sliced","free"

(underway-currently"same"limitsneedtobehard-coded)

Whenaxesare"same",onlyshowaxesonplotmarginsinstead

ofeverypanel(underwayforggplot2)

trelliscopejs-lib

Morevisualfiltersforcognostics(dates,geographic,bivariate

relationships,etc.)

Bookmarkable/sharablestate

Viewmultiplepanelsside-by-side

Supportforreceivingpanelsfromotherendpoints

Page 42: TrelliscopeJS - stats.research.att.comstats.research.att.com › nycseminars › slides › hafen.pdfplotly::ggplotly(p) This helps but there is still too much overplotting... (and

ForMoreInformation

Twitter:

Blog:

Documentation:

Github:

@hafenstats

http://ryanhafen.com/blog

http://hafen.github.io/trelliscopejs

https://github.com/hafen/trelliscopejs