62
BDSS IGERT Speed Dating/Matchmaking Event September 19, 2014

BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook [email protected] ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

BDSS IGERT Speed Dating/Matchmaking Event

September 19, 2014

Page 2: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

John%BeielerPhD$Student,$Poli/cal$Science

[email protected]

johnbeieler.org

Page 3: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Event&DataWho$did$what$to$whom

• Python,)R

• Natural)language)processing

• Forecas7ng

• Poli7cal)violence

Page 4: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Wanghuan'Chu'•  4th+year'Ph.D.'student'in'Sta6s6cs'•  Key'strength:'Sta6s6cal'modeling'–  Nonparametric'regressions,'mixed+effects/mul6level'models,'discrete'choice'models,'sta6s6cal'learning'algorithms,'causal'inference'techniques,'etc.'

•  Research'experience'–  Thesis&research:'Feature'screening'methods'for'ultrahigh'dimensional'longitudinal'data.'•  e.g.'Gene6c'data'with'870,000'SNPs'from'540'subjects'(p'>>'n)'

–  1st&IGERT&rota1on:'Causal'media6on'analysis'for'clustering'data'using'mixed+effects'models,'propensity'score'modeling'and'inverse'probability'weigh6ng.'

Page 5: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Wanghuan'Chu'

•  Poten6al'components'for'the'ideal'project'–  Parallel'compu6ng'to'Big'Data'(e.g.'MapReduce).'–  Sta6s6cal'methodologies'at'data'analy6cs'layer.'–  Interes6ng'social'science'ques6ons'to'be'explored.'

•  SoVware:'R'and'SAS'(MACRO'and'SQL)'•  Interested'in'learning'–  New'programming'language'(e.g.'Python).'–  New'methodology'and'domain'knowledge.'

Page 6: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

An Introduction: Cindy Cook [email protected]

!  B.S. in Mathematics •  Graph Theory •  Parallel Computing with MPI:

•  Recommender Systems !  M.S. in Applied Statistics

•  R, SAS, Stata, C++ •  Machine Learning •  Survival Analysis

•  Cox Models !  Ph.D. in Statistics

•  No particular advisor or research

Page 7: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Research Interests:

! Big Data ◦ With statistical applications in the Social

Sciences ◦  Python, parallel computing in R, and

broadening my overall computing skills

! Data that has spatial/temporal trends ! Networks on a large scale ! Any combination of these

Page 8: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Timmy Huynh ■ Sociology & DemographyAdvisor: John Iceland [email protected]

Education B.A., Geography / Economics, The

University of Texas at Austin, 2010

M.A., Social Sciences, The University of Chicago, 2011

Research experience (selected)

REU Summer Institute in Minority Group Demography – Austin, TX, 2009

Summer Institute in LGBT Population Health – Boston, MA, 2010

Asian Americans Advancing Justice –Chicago, IL, 2011-2012

Oak Ridge National Laboratory – Oak Ridge, TN, 2012-2013

Research interests Urban sociology

Spatial demography

Economic geography

Networks

(Geo)Visualization

Skills Statistics (Stata, SPSS, R)

GIS (ArcGIS, GeoDa, ERDAS)

Programming (Python, JavaScript)

Page 9: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Christopher Inkpen Sociology and Demography

Recent Projects

- determinants of student migration - visualizing global migration patterns - assessing impact of recession on internal migration

Tools

- Statistical Models: linear regression, GLM, HLM, fixed and random effects, spatial econometrics

- Computing: Stata, R, Python, SQL - Mapping: ArcGIS, CartoDB

Broad Interests - global migration patterns - assimilation - population processes

Page 10: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Areas to explore

Population estimation and data fusion

Mapping of social networks

Page 11: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Department of Human Development and Family Studies

Rachel Koffer [email protected]

3rd year Ph.D student in the Department of Human Development and Family Studies Concentrations: Individual Development, Methodology Advisors: Nilam Ram, David Almeida B.A. Psychology, Economics; Minor: Environmental Studies Skills I Bring to the Rotation: SAS, R, LISREL, SPSS, STATA Statistical skills: General linear, multilevel, structural equation modeling, PCA and Factor Analysis

Skills I Hope to Develop/Improve During the Rotation: Python; Data visualization, Machine Learning

Page 12: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Methodological Interests: Analysis of: Intensive longitudinal data (many measurements across short time span); Multiple time scales (intensive longitudinal data w/in longer-term data);

Substantive Interests: Association between daily experiences and well-being. Effects of daily stressors on daily and long-term affective (mood) and physical well-being.

Potential Interests for Research Rotation: Machine learning techniques for developmental time series data Application of interdisciplinary methods to stress concepts

Department of Human Development and Family Studies

Rachel Koffer [email protected]

Page 13: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

September 19, 2014

Fridolin Linder

Department: Political Science (2

ndYear PhD)

Fields: Methodology, Comparative Politics, Statistics (Grad. Minor)

Interests: Predictive Modeling/Machine Learning, Text Analysis

(Classification,Scaling), Political Representation, Research

Design/Causal Inference/Epistemology

Skills: Statistics, R (substantial), Python

Current Projects: Datamining as Exploratory Data Analysis (w/ Zach

Jones), Rationalization of candidate choice through missreporting of

ideological self-placement (experiment)

Fridolin Linder BDSS IGERT Matchmaking Event 1 / 1

Page 14: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •
Page 15: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Jonathan K. NelsonDepartment of Geography

[email protected]"

Abstract—I am a Ph.D student in the department of geography. Prior to coming to Penn State I was a cartographer for National Geographic. I study spatial data representation and explore patterns and relationships in geographic phenomena, using spatial statistics and visual analytics approaches. I am particularly interested in interactive multi-scale visual and data abstraction techniques for making sense of BIG DATA. "

My current research rotation is in the GeoVISTA Center and involves leveraging geo-social media data to support crisis management. Other projects I am working on include: a visual analysis of 1200 student maps from a massive open online course titled “Maps and the Geospatial Revolution;” an exploratory analysis on multiscalar effects of the modifiable areal unit problem on cancer diagnosis rates and median income; and a human-pet-computer interaction study that aims to build healthy relationships between pet owners and their dogs using personal visualization and quantification."

Tools I commonly use for carrying out and conveying my research include: Adobe Creative Suite, Avenza MAPublisher, Final Cut Pro; ESRI ArcGIS, GeoDaA, R; CSS, HTML, JavaScript D3. "

!Keywords— spatial data, visualization, cartography, map, scale, aggregation, information design "

Page 16: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Deeper Learning in Large-Scale Text INTELLIGENT SYSTEMS LABORATORY

APPLIED COGNITIVE SCIENCES LABORATORY

Alexander G. Ororbia II, IST PhD Student

Page 17: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

What do I do?

Build: Deep models for learning from Scholarly Big Data

Multilayer neural networks, learning kernels

Boltzmann Machines

Convolutional Networks—text recognition in-the-wild (“Text in the Wild”)

Active Learning Algorithms

Bayesian Network Lattice for error-correcting Amazon Mechanical Turker annotations (Ororbia et. al, 2014, Under Review)

Investigate: Can deep architectures discover/model inherent hierarchical structure in text?

How can intelligent systems work in tandem with humans to solve complex problems?

Can intelligent tools be built that harvest and organize vast amounts of scholarly data?

What insights can these same algorithms extract from the data?

Page 18: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

BDSS-IGERT Joshua Snoke

[email protected]

• About Me:

• 2nd Year PhD Student, Department of Statistics

• 1st Year BDSS-IGERT Trainee

• B.S. in Mathematics and Economics

• Current Research:

• Data Privacy, Disclosure Limitation Methods

• Synthetic Data for Public Use (in Sociology Studies), Parametric and Non-Parametric Methods

Page 19: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

BDSS-IGERT Joshua Snoke

[email protected]

• Currently Seeking a Research Project Outside of the Statistics Dept.

• Interests and Applications:

• Policy, Politics (National and Global)

• Social Networks, Relationships

• Methodology, Causal Inference, Bayesian Methods

• Computational Proficiency:

• Significant Experience in R

• Some Experience in Python, Java, and SQL

Page 20: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Sam Stehle Geography

Previous activities • Data management for mobile GIS

• Matching space-time patterns for political/social comparison

• Visual analytic software design/implementation

• Text classification of Twitter data

• Event data collection/classification

• Time series intervention modeling

Methods experience • Java • Python • C++ • R • Spatial analysis, GIS • SQL • Time series analysis • Raster/image analysis • Machine learning with Weka

Background • B.S. University of Utah; geography, minor in Computer Science

• M.S. Penn State; geography

Page 21: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Sam Stehle Interests for Future Work

Topics • Political geography

• Understanding events

• Spatio-temporal patterns

• Sport

• Media representations

• Multi-scale issues

Dissertation considerations • Geography/politics of international sport

• British Commonwealth Games

• Multi-scale spatio-temporal modeling

• RSS feed data + geo/social/political context

• Data-driven vs. dictionary event classification

• Spatio-temporal diffusion patterns

Page 22: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Clio Andris (Assistant Professor of GIScience) Dept. of Geography, [email protected].

Courses: Fall GEOG560: Interpersonal Relationships in Geographic Space, Spring GEOG363: GIS

A System of Systems

Page 23: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Network thanks to Paul Hooper, Emory U.

Example 2 Example 3

Example 4

Page 24: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

MID/DLE: An End to Data Collection

AnalyzeCollect Researcher Computer

ParticipantMeasureMeasureMeasureParticipantMeasureMeasureMeasureParticipantMeasureMeasureMeasure Maintain

Tim Brick, HDFS

Page 25: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •
Page 26: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

1950 1960 1970 1980 1990 2000 2010

-0.50

-0.25

0.00

0.25

0.50

0.75

Late

nt P

hysi

cal I

nteg

rity

Mor

e A

buse

Mor

e R

espe

ct

Estimated Yearly Average of Two Dynamic Latent Physical Integrity Variables

Dynamic Standard of AccountabilityConstant Standard of Accountability

-4 -2 0 2 4

-4

-2

0

2

4

19761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010

Dyn

amic

Sta

ndar

d M

odel

Est

imat

es

Constant Standard Model Estimates

Mor

e A

buse

Mor

e R

espe

ct

More Abuse More Respect

Disagreementbetweenestimatesincreaseseach year

Christopher J. Fariss Respect for Human Rights has Improved Over Time

Page 27: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Human Rights Documentation Project

11,715 human rights documents.Eventually I will have more than 20,000 documentsMost are already coded.

Christopher J. Fariss Respect for Human Rights has Improved Over Time

Page 28: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Epidemics – The Dynamics of Infectious Disease !

•  > 19,000 participants!•  158 countries !•  > 5,000 completed!•  > 300,000 unique video views !•  > 8000 browsed the forums!•  > 4200 participating in forums!•  3683 forum threads!•  31919 forum posts!•  15486 forum comments!

•  Novel format for delivering content !•  Novel format for interacting with learners !•  The online discussion is a educational resource !

dbagshaw
Ferrari, BIOL
Page 29: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Epidemics – The Dynamics of Infectious Disease !

Page 30: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Chris Fowler Assistant Professor of Geography

and Demography

[email protected]

The big question: When cities spend money on stuff (e.g.

affordable housing, transport systems, parks)….

…who suffers, who benefits?

…how do those costs and benefits change

communities?

The current question: Can we use demographic data at very fine

geographic scales to identify signals that relate

to the above?

Page 31: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Mo

re

se

gre

ga

te

d Æ

Å

Mo

re

Div

erse

Smallest Scale Largest Scale

Completely

Segregated

Very

Diverse

Segregated

The Measure: Multiscale segregation

Page 32: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Multiscale Segregation Profiles: Functional Forms

More

segregated

More

diverse

18,000 cells x 25 scales x 3 census years = 1.35 million data points to study 8 neighborhoods

Presenter
25 rasters like the one to the left make up the graphic on the right If segregation and diversity are enfolded, then the segregation profile may be a way to represent that. So what does enfolding look like.
Page 33: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Multiscale segregation is just the start • Improving the measure

• Scale and interpolation issues

• Restricted Census data

• Other variables

• Parcel data on housing price

• Poverty status

• Income

• Other cities

• Visualization and classification

• Interpretation and prediction

Page 34: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

GeoTxt

SensePlace2

Frank Hardisty ([email protected]) at the GeoVISTA Center

Page 35: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

People and Ideas

GeoVISTA Student Affiliates

• John Beieler • Jennifer Mason • Jonathan Nelson • Sam Stehle • Josh Stevens

Project Ideas • Interactive NLP • GeoTxt crowd-

sourcing • Interactive social

graph analysis • Your pet idea

bridging social science and geo-science

Page 36: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

The Psychometrics of College Tests

Loken, HDFS

Page 37: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Signal and noise in data on body weight

Page 38: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Stephen A. Matthews Professor of Sociology, Anthropology & Demography (courtesy, Geography) Director, Graduate Program in Demography

Research Interests: My research focuses on population health and health inequality. An important part of my work is an interest in conceptual and methodological issues associated with how neighborhoods are defined and their attributes are measured, and the relevance of these definitions and measures to individual behavior and health outcomes. Proposed Research Project: A friend and colleague, Basile Chaix (Université Pierre et Marie Curie, Paris) has geocoded data on 90,000 places (activity locations) for 6,000 Parisians. For each respondent we know the self-reported boundaries of their neighborhood (VERITAS-RECORD project). During the project the emphasis would be to develop/refine methods to (a)compare the patterning of locations visits to self-reported neighborhood; (b) compare patterns across individuals residing in the same neighborhood; (c) identify hierarchical use patterns (frequency) among location types; (d) examine the significance of focal locations (e.g., work, home); and, (e) determine the optimal/minimal number (and type) of locations reported that offer a useful proxy for the total distribution of locations that an individual visits.

Stephen A. Matthews – [email protected] – BDSS Speed-dating Meeting (Fall 2014) Slide 01

Page 39: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

VERITAS-RECORD = Visualization and Evaluation of Route Itineraries, Travel Destinations, and Activity Spaces – Residential Environment and CORonary Heart Disease. Youtube VIDEO at https://www.youtube.com/watch?v=91x_S2Q-tic

Stephen A. Matthews – [email protected] – BDSS Speed-dating Meeting (Fall 2014) Slide 02

Ideal Skill Sets Required: a)Good data organizing skills b)Good communication skills c)Good documentation skills d)Solid statistical background e)Programming skills (for automating repetitive tasks) f)Patience g)Mapping and data visualization skills h)GIS experience, preferably ArcGIS i)Some familiarity with point pattern analysis, local neighborhood statistics, and density/surface mapping. j)Some familiarity with activity space and time-geography literature – and willingness to learn more.

Opportunities to be involved in manuscripts to be developed for publication in epidemiology, public health and/or geography-related journals

Page 40: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Quantitative interestProcess modelsMultivariate continuous time modeling with all driving parameters person-specific

- unbalanced/unstructured data- describing change in terms of instantaneous regulation and short- and long term trends- synchronicity in changes among the longitudinal variables

Bayesian statistics - flexible framework for implementing parameter estimation for highly complex models- focus: sequential updating methods for online inference from streaming data (health monitors,

Twitter etc.)

bayesian.zitaoravecz.net

Main goal: developing novel multivariate dynamical models that capture psychologically meaningful properties of change over time in terms of latent variables

Æstudy individual differences therein Æe.g., interventions can be tailored based on these variables

Page 41: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Substantive interestAffective science

- regulatory mechanisms in valence and arousal levels

- their connection to personality traits

Well-being

- subjective (self-reported) wellbeing as multidimensional state

- devising measurement instrument

Cognitive process models

- describing decision making in terms of latent variables

- identifying links between cognitive parameters

and individual characteristics

Research tools

- translating methodological research into practical tools

- free, user-friendly programs for applied researchers

bayesian.zitaoravecz.net

0 20 40 60 80 1000

20

40

60

80

100

Valence

Arousal

Page 42: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Donna PeuquetDepartment of Geography

Research interests:• Geographic knowledge

representation• Knowledge discovery• Space-time dynamics

– Visualization– Geovisual analytics– Data models– Computational/statistical

modes

Current project: STempo• Provide the capability to:

– quickly reveal temporal and spatio-temporal patterns from large collections of event data• Find previously unknown patterns• Confirm suspected / assumed

patterns• Find examples of specific patterns

in other locations / times

• Using computational + visualization techniques

• …and coded events from RSS newsfeeds - GDELT

Page 43: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Potential projects:

• Add context to visualization and/or analysis• Examine how patterns change over varying

contexts• Develop means to identify anomalies, precursor

and postcursor events• Develop capability to visually identify repeating

patterns/cycles• Develop means to facilitate evaluation of

pattern importance

What kinds of hidden significant structure exists in complex space-time behaviors?

Page 44: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

NILAM RAM

HUMAN DEVELOPMENT & FAMILY STUDIES [email protected]

IGERT BDSS PSU

SEPTEMBER 19, 2014

FINDING MEANING IN THE DATA FOREST

Page 45: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

stressdirections.com

Data Acquisition Data Fusion Data Management Data Visualization Data Mining Data Modeling

In-Vivo Data In-Silica Data

In-Virtual Data Real-Time Data Interactive Data

Page 46: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

HMM (STATE SEQUENCE) ESTIMATION REAL TIME ANALYSIS TIME-AWARE RECOMMENDATIONS

Probabilistic state sequence extracted from

4-state HMM

ID# 103, age 2-mo Inoculation Paradigm

X t+1 = AX t +Vt+1Yt+1 =CX t +Wt+1

Page 47: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

CELLULAR AUTOMATA SIMULATIONS OF COMPLEX EMERGENT BEHAVIOR + INTERACTIVE DATA VIZ GAMING

dt = .01, R = .2, A = .08, B = 1.5, C = .15, Du = .5, Dv = 20 100x100 grid with periodic boundaries, random uniform initial conditions (0,.1)

u t

= Rf u,v( ) +Du2u

v t

= Rg u,v( ) +Dv2v

f u,v( ) = A Bu+u2

v 1+Cu2( )g u,v( ) = u2 v

Page 48: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

ENSEMBLE METHODS FOR (UN)STRUCTURED DATA

Page 49: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

ENSEMBLE METHODS FOR (UN)STRUCTURED DATA

Page 50: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Data$Privacy,$Causal$Inference,$Categorical$Data$methodologies$…$

Aleksandra$(Sesa)[email protected]$$

$Departments$of$StaBsBcs$&$Public$Health$Sciences$$

Pennsylvania$State$University$$$$

Sep$19,$2014$@$BDSS$matching$day$

$

1"

Page 51: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Privacy"in"Sta-s-cal"Databases"Agency/"

Organiza-on/Database"

Respondents/Individuals/Organiza-ons"

Users"

Queries"

Answers"

Government,"Researchers,"Businesses"Clinicians"Pa-ents""(or)""

Malicious"adversary"

• ""Large"collec-ons"of"personal"informa-on""• ""census"&"survey"data"• ""social"networks""• $$medical/"public"health/genomic"• ""web"search"records,"etc"

Collect"""!"""""""Store""""!"Analyze/Share"

Cloud"compu-ng"

Page 52: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Privacy"Research"ques-ons"•  Research$MaOers:"Privacy"in"Sta-s-cal"Databases"

•  Main$theme:$integra-ng"computer"science"and"sta-s-cal"approaches"to"data"privacy"

–  Social,"Behavioral"&"Economic"data"–  TradeQoff"between"data"u-lity"and"disclosure"risk"–  Rigorous"privacy"defini-ons"(e.g.,"Differen-al"Privacy)"–  Synthe-c"data"–  Priva-za-on"of"social"networks"data""–  Private"GenomeQwide"associa-on"studies""–  Privacy"with"Distributed"databases"

3"Image"ref:"hWp://www.orgnet.com/email.html"

Page 53: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Other"projects"•  More"general"categorical"data"methodologies"(Bayesian"analysis,"algebraic"sta-s-cs,"…")"

with"observa-onal"data"–  Causal"Inference"–  Ecological"Inference""

•  Sta-s-cal"Data"integra-on"–  Combining"data"from"mul-ple"sources"–  Merging"big"data"with"probability"samples."Can"we"use"informa-on"from"surveys"to"

help"generalize"analyses"from"largeQscale"administra-ve/private"or"organic"data?"

•  Popula-on"size"es-ma-on"

•  Data"analysis"and"methodology"with"small"n,"large"p"problems"in"two"se[ngs"–  CSCW"and"HCII"data"–  Study"of"communica-on"and"awareness"in"online"collabora-ve"tools"–  Ques-oners"and"logQac-vity"data"

–  NEW:"Neural"data"and"neuroimaging"fMRI"data"analysis"modeling"language"plas-city"in"bilinguals"

–  Time"and"frequency"domain"analyses"4

Page 54: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Communication-based diffusion Rachel Smith, Communication Arts & Sciences

1. PEPFAR Namibia

• Existing data available for network analysis and HIV-related indicators – Two-mode networks (persons

and community groups) – Cross-sectional – 15 communities, ~n=300 in

each site

2.  ‘Contagious’  messages

• Collect and analyze new data – Track online messages related

to ebola – Predict a) what types of

messages get passed onto another person, and b) predict what aspects of the message change and remain the  same  in  the  ‘retelling’

– Compare end to CDC or WHO stories and advice

Page 55: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Promoting Intergenerational Communication through Facebook Dr. S. Shyam Sundar

College of Communications

Different Use of Facebook among Senior Citizens (N=352)

Jung, E.H. & Sundar, S. S. (2014). Senior Citizens on Facebook: How do they Interact and Why? Paper presented at the 96th annual conference of the Association for Education in Journalism and Mass Communication, Montreal, Canada.

• Social bonding One-to-one communication (e.g., commenting, chatting) • Social bonding & Social bridging Self-presentation activities (e.g., updating status) • Social bonding & Curiosity Social  surveillance  activities  (e.g.,  checking  out  people’s  

walls)

Frequency of senior  citizens’  participation in Facebook activities (N= 168) Facebook Activity Mean SD

Stay in touch with friends and family 3.27 1.43

Reunite with old friends 2.56 1.14

Keep up with others’  activities   2.39 1.14

Comment on others’  postings   2.38 1.17

View or upload photographs 2.33 1.29

Pass the time 2.15 1.77

Keep up with current events 1.95 1.20

Update my status 1.88 1.00

Browse profiles 1.80 1.08

Post items (e.g. news articles) 1.78 .91

Sundar, S. S., Oeldorf-Hirsch, A., Nussbaum, J. F., & Behr, R. A. (2011). Retirees on Facebook: Can online social networking enhance their health and wellness? Proceedings of the 2011 Annual Conference Extended Abstracts  on  Human  Factors  in  Computing  Systems  (CHI  EA’11), 2287-2292.

Page 56: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

BIG DATA

LIKES

PHOTOS

WALL POSTS

COMMENTS

CHATTING

PRIVATE MESSAGES

What are senior citizens doing on Facebook for intergenerational communication?

What’s  on  your mind?

BIG DATA @ FACEBOOK

• What kinds of technology affordances do senior citizens use? With whom are they using it?

• What is the relationship between sender and receiver?

• Through sentiment analysis, how much social support do they receive from family members on Facebook?

Page 57: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •
Page 58: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •
Page 59: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

II

I

Page 60: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

II

I

Page 61: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Leadership and Sentiment Analysis of an Online Cancer Support Community using Computational Text Mining

Kenneth Portier & Greta Greer, American Cancer Society John Yen, Prasenjit Mitra, Kang Zhao, Baojun Qiu, Dinghao Wu, & Cornelia Caragea, The Pennsylvania State University

Introduction Online communities are an important source of social support for cancer survivors and caregivers. The ACS Cancer Survivors Network (CSN) is the oldest and largest of these, with 160, 000+ members. This study used computational text mining analysis of 48,779 threaded discussions (468,000 posts by 27,173 members) to identify emerging community leaders and classify user sentiment over sequential posts.

Methods Leader Analysis: Posts of 41 recognized CSN leaders and 2366 other users were analyzed. 21 leadership characteristics were scored and used to calibrate single and ensemble classifiers capable of correctly identifying leaders.

Leader Analysis Results

Sentiment Analysis Results

Implications

78% and 85% of community leaders are correctly identified with the single best and ensemble classifiers respectively.

The best fitting sentiment classifier has an 80% correct classification rate with 68.8% of posts classified as positive. 75% of negative thread originators subsequently express positive sentiment when at least one reply is received from peers. Probability increases with number of replies. Positive thread initiators are more likely to have positive subsequent sentiment than negative thread originators.

• Early/proactive identification of potential leaders gives community managers the opportunity to encourage the growth of desired leadership qualities and thereby maintain strong peer leadership.

• Sentiment analyses results support the hypothesis that online cancer communities like CSN can effectively facilitate peer interactions in a safe, welcoming environment to help members feel more positive about their situation.

Influence

Micro level

Network structure, diffusion, and evolution Macro level

Sentiment influence & Influential users

Members’  publishing   behaviors and influence

Information diffusion & the evolution of collaboration networks

Sentiment Analysis: User sentiment was computed through a multi-stage process. 13 lexical/style features were extracted from a training set of 298 randomly-selected posts manually assigned to positive (204) or negative (94) sentiment, then used to calibrate 5 classifiers. Utilizing the best-fit classifier, sentiment level was established for all 468,000 posts, and change  in  sentiment  between  users’  initial  and  subsequent  posts examined.

Community Member Posting

Features

Classifiers

Decision

Leader

Participant Full Community

Training Set

Leader

Participant

• Contribution features – The numbers of posts/threads – The length of posts – The  time  span  of  one’s  activities – … • Centrality features – A post-reply network among users – Nodes and Edges – In/out-degree, Betweenness, PageRank • Semantic features – Appearance of words with

positive/negative  sentiment  in  a  user’s  posts

– The use of slangs and emoticons

80% use Internet for health-related purposes

Adult Internet users in the U.S.

1 in 4 joins OHCs

Community Member

Lexical/style Features

Classifier

Decision

Leader

Participant Full Community With unknown Sentiment.

Training Set with Assigned sentiment

+ - Similarity measure

CSN Forum Posts

Labeled Posts

Non-Labeled

Posts

Sentiment Model

Model Selection

Feature Extraction

(Post, Pr, Label)

ROC Area = Area under the Receiver Operator Characteristic Curve Best is AdaBoost: False positive rate 0.152, False negative rate 0.33 Best Features: Post_Length, #_Negitive_Words, #_Internet_Slang_Words, #_Names_Mentioned, (N#_Pos+1)/(#_Neg+1), PosStrength, NegStrength

Initial Post (P1)

Responding Reply (R1)

1st Self-Reply (P2)

M-th Self-Reply (Pn)

Reponding Reply (Rm)

Sentiment Change Indicator: The difference between the    average  sentiment  of  the  originators’  self  replies  and the initial sentiment of the thread originator.

The more positive the sentiment of replies from others, the more positive the originator became.

Naïve Bayesian Logistic Reg Random Forest One-Class SVM Two-Class SVM

Ensemble Classifier

Page 62: BDSS IGERT Speed Dating/Matchmaking Event · 2015. 8. 28. · An Introduction: Cindy Cook cmc496@psu.edu ! B.S. in Mathematics • Graph Theory • Parallel Computing with MPI: •

Topic Discovery Using Discussion Posts in an Online Cancer Community Kenneth Portier & Greta Greer, American Cancer Society

John Yen, Prasenjit Mitra, Siddhartha Banerjee, Mo Yu, & Prakhar Biyani, The Pennsylvania State University Lior Rokach & Nir Ofek, Ben-Gurion University of the Negev

• The ACS Cancer Survivors Network (CSN) is the oldest and largest online community for cancer survivors, with 25K unique visits per day.

• Question: Are different discussion topics associated with different sentiment changes, a measure of social support, of the thread initiators?

• Method: sentiment analysis and topic modeling • Data: CSN breast and colorectal cancer

discussion forum posts from 2005-2010

Sentiment Analysis: • User sentiment is computed through

a multi-stage process. • 13 lexical/style features are extracted

from a training set of 298 randomly-selected posts manually assigned to positive (204) or negative (94) sentiment

• The training set is used to calibrate 5 classifiers.

• Utilizing the best-fit classifier (80% correct classification rate), sentiment level was established for all 468,000 posts (68.8% Positive).

• Estimate the impact of Responding Reply sentiment on sentiment change of thread initiators.

• Negative thread initiators are likely to have positive sentiment change.

• Sentiment Change Score computed as the difference between the average sentiment of the  originators’  self  replies  and  the  initial  sentiment of the thread originator.

• Only one or a couple of days typically span the time between initial post and first follow-up response by the thread initiator.

• The sentiment change observed is likely a reaction to the positive sentiment posts from the community.

The more positive the sentiment of replies from others, the more positive the originator became.

Topic Model Analysis: • Topics of thread initiating posts were identified

using Modified Latent Dirichlet Allocation (LDA-VEM), which assigns each initiating post the probabilities of belonging to each topic.

• Posts were classified to the highest-probability topic.

• Analyses of 20 to 50 topics indicated choice of 30 topics as being reasonable for both forums.

• Selected word combinations (bi-grams) identified in an initial analysis of the posts were subsequently converted to single words (e.g.  “breast  cancer”  to  “breastcancer”)  to  retain their meaning.

• Remaining words were reduced to root form (i.e., stemming).

• Terms occurring very often (> 80% of posts) or very seldom (<5 posts) were removed prior to analysis.

Methods Overview

Breast Cancer Discussion Forum Colorectal Cancer Discussion Forum • Average sentiment change Index

(and associated 95% confidence intervals) vs main post topic for CSN beast and colorectal cancer discussion board

• High average sentiment change scores indicate that community responses have a positive effect on the emotions of the thread initiators.

• Low average sentiment change scores could indicate either that community response has little impact  on  the  initiator’s  emotions or (more likely) that the initial post sentiment was high to begin with.

• Sentiment change score vs initial sentiment by topics for the breast and colorectal cancer discussion board.

• Each box is centered on the mean for each topic with the area representing 95% confidence.

• Topics with high average initial post sentiment tend to have lower average sentiment change scores.

Results

• The increased understanding of topics and related sentiment supports development of ancillary information to be made available to CSN members to supplement forum discussions.

• We envision using these results to create tools • Notify community leaders when

posts with low initiating sentiment do not produce adequate community response

• Point the initiator to threads where the topic may have been discussed in the recent past.

• These improvements can further improve community social support and subsequently  members’  quality  of  life.

• Both forums show that pain, medical worries, and treatment side-effect issues initiate with very low (most negative) sentiment and have highest sentiment change.

• Breast cancer posts tend to initiate lower sentiment than colon cancer posts while sentiment change tends to be higher.

Conclusions

Initial Post (P1)

1st Self-Reply (P2)

M-th Self-Reply (Pn)

Responding Replies (R1-Rk) To Initial Post

Responding Replies (R1-Rk) To Self-Reply

Training Set N=298

Feature Extraction

Classifier Calibration & Selection

Classification of Corpus

Sentiment Change Analysis

Initial Post Extraction

Breast & Colorectal Cancer Forums

Initial Post Key Words

Key Phrase Recoding

Common & Unique

Word Removal

LDA-VEM Analysis –

Topic Identification

Initial Post Topics & Likelihood

Cancer Survivors Network (CSN) Discussion Board Posts

Sentiment Change