27
Enhancing a Social Science Model-building Workflow with Interactive Visualisation Cagatay Turkay, Aidan Slingsby, Kaisa Lahtinen, Sarah Butt and Jason Dykes giCentre & Centre for Comparative Social Surveys at City University London ESANN 2016, 29 April 2016

Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Embed Size (px)

Citation preview

Page 1: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Enhancing a Social Science

Model-building Workflow with

Interactive Visualisation

Cagatay Turkay, Aidan Slingsby,

Kaisa Lahtinen, Sarah Butt and Jason DykesgiCentre & Centre for Comparative Social Surveys at City University London

ESANN 2016, 29 April 2016

Page 2: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

“We (social scientists) need (data-based)

models that we can understand and

explain so that we can defend them to

our peers in full confidence.”

A quote that motivates this work (from collaborators within our AddResponse project)

Image from: Lahtinen, K. et al. (2015). Informing Non-Response Bias Model Creation in Social

Surveys with Visualisation. Poster VIS 2015

Page 3: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Numerical models to predict phenomena or, act as a

simulation of the phenomena being investigated

Good predictive power is often desired in models, BUT, (in

some fields) explanatory power is also crucial (Shmueli, 2010 for a detailed

[*] Shmueli, Galit. "To explain or to predict?." Statistical science (2010): 289-310.

discussion)

Page 4: Enhancing a Social Science Model-building Workflow with Interactive Visualisation
Page 5: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

AddResponse Project -- https://blogs.city.ac.uk/addresponse/

… utilise organically generated auxiliary data (from commercial

transactions, public administration and other sources) to understand propensity

to respond and eventually tackle nonresponse bias (i.e.,

respondents differ from nonrespondents ).

Page 6: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

AddResponse - Details

• European Social Survey (ESS) UK 2012 - 13

• 4,520 households

• linked to auxiliary data from:

• administrative sources

• commercial consumer profiling

• open-source data

• 401 auxiliary variables

• 32 survey response variables (only for the respondents)

e.g., Proportion

of house

sharing adults

e.g., Sports

facilities

within walking

distance

Page 7: Enhancing a Social Science Model-building Workflow with Interactive Visualisation
Page 8: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Existing workflow

• Iteratively add and/or removing variables from a

logistic regression model

• Assess the changes through model fitness metrics

(e.g., AIC, McFadden)

• Put up a sticker !

• Highly manual but involved!

Page 9: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Key roles for interactive visualisation

• Incorporating Theory

• Exploring variables

• Interactively building models

• Considering Geography

• Recording the model-building process, i.e., provenance

VarXplorer ModelBuilder

Page 10: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Prototype-1: VarXplorer

Co-variation plot

Correlations with

indicators

Theory-related

meta-data

Interactive

modelling

Page 11: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Link to the Video: http://goo.gl/XNiOIX

Page 12: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Exploring variables – 1: Investigate Covariation

- Compute pairwise correlation within all

401 variables

- Use this as a distance matrix and

project to 2D (using MDS)

- Visualise on a scatterplot where each

point is a variable

Page 13: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Exploring variables – 2: Correlation with indicators

- Compute correlations within all 32

response variables + response rate

- Use this as meta-data on variables to

check whether they relate to indicators

Page 14: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Incorporating Theory-related data

- Associate variables to social-science

concepts and theory

- Concepts relate to theories

- Variables act as proxies for concepts

- Use these as meta-data on variables

and visualise through histograms

Concepts, e.g.,

deprivation or quality

of life

Theories, e.g., social

isolation or social

disorganisation

Page 15: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Prototype-2: ModelBuilder

Variable selection

Model provenance

Interactive modelling

(through R)

Model quality

metrics

Page 16: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Prototype-2: ModelBuilder

Link to the Video: http://goo.gl/itUlm2

Page 17: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Interactively building models & evaluating them

- R scripts are called with the variable

selections and the variable to predict

(response or ESS variable)

- Quality metrics (AIC, McFadden) &

variables weights visualised

Interactive model building

also in VarXplorer

with variable weights

Page 18: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Considering Geography

- Facet data (geographically) into 12 regions

- Build local models

- Evaluate locally

Page 19: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Model provenance & annotations

- Save and analyse the model-building

trail

- Mark dead-ends and good models

- Attach notes to models

Page 20: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

A brief example of the modelling process

1. Select two

concepts ,

economic

circumstances and

quality of life

Page 21: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

A brief example of the modelling process

2. Select variables

that are distinct

and relevant

Page 22: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

A brief example of the modelling process

3. Select variables

that correlate

with an ESS

indicator

(happiness)

3.1 Observe that

they relate to

“Social Isolation”

Page 23: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

A brief example of the modelling process

4. Use these variables as a

starting point, check local

variations and plug into

existing scripts

4.1 Model performs

“better” in South-East UK

and in Greater London

Page 24: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Lessons learned

• Enhanced analysis through informed use of computation

• Interactive visual methods improve reliability and

interpretability

• Improved trust in models

• Tight integration enables quick hypothesis prototyping

• Important to communicate the certainty of the findings

Page 25: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Looking into the future

• Explanatory models not only predictive models

• Incorporating more complex methods (already

incorporated random forests)

• Other ways to make models more accessible?

• Use models & findings as scientific evidence ?

Page 26: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Acknowledgments

• giCentre team @ City

• ADDResponse project funded by the UK Economic

and Social Research Council (grant ES/L013118/1)

Page 27: Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Thank you !

[email protected]

@cagatay_turkay

http://staff.city.ac.uk/cagatay.turkay.1/

https://blogs.city.ac.uk/addresponse/

http://www.gicentre.net/

!! We are hiring !!

* Researcher in visualisation of cyber-security data

(H2020 funded RIA)

* PhD studentships

Deadlines in late May and June

check giCentre.net