38
DEPARTMENT OF SOCIOLOGY The Complexity of Data: Computer Simulation and “Everyday” Social Science Edmund Chattoe-Brown [email protected]

The Complexity of Data: Computer Simulation and “Everyday” Social Science

Embed Size (px)

Citation preview

Page 1: The Complexity of Data: Computer Simulation and “Everyday” Social Science

DEPARTMENT OF SOCIOLOGY

The Complexity of Data: Computer Simulation and “Everyday” Social

Science

Edmund Chattoe-Brown

[email protected]

Page 2: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Plan of talk

• Simulation as a confusing term.

• A simple (but revealing) example.

• The importance of data collection: Simulation methodology.

• Where does complexity fit into all this?

• A more challenging example: DrugChat.

• Conclusions.

Page 3: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Simulation as a confusing term• Not “gaming” or “role playing”: Student United Nations.

• Not system dynamics, discrete event simulation, analogue simulation and so on, though these are ancestors.

• Not simulation as discussed by Bourdieu, whatever that is.

• Instrumental versus descriptive simulation: Not just a technical tool (doing the same sums quicker) but a distinctive way of understanding (explaining) social behaviour.

• A social process described as a computer programme rather than a narrative or a statistical/mathematical model.

• Other disciplines, other approaches: Experiments, time series, documents/content analysis, GIS.

Page 4: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Spatial segregation (Schelling)• Agents live on a square grid (like a US city) so each has a maximum of

eight neighbours.

• There are two “types” of agents (red and green) and some spaces in the grid are vacant. Initially agents and vacancies are distributed randomly.

• All agents decide what to do in the same very simple way.

• Each agent has a preferred proportion (PP) of neighbours of its own kind (0.5 PP means you want at least half your neighbours to be your own kind - but you would be happy with all of them being so i. e. PP is a minimum.)

• If an agent is in a position that satisfies its PP then it does nothing otherwise it moves to an unoccupied position chosen at random.

• A time period is defined as the time it takes for each agent (chosen in random order to avoid non robust patterns) to “take a turn” at deciding and possibly moving.

Page 5: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Initial random state

Page 6: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Two questions• What is the smallest PP (i. e. number 0-1) that will produce clusters?

• What happens when the PP is 1?

Page 7: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Simple individuals but complex system

I ndividual Desires and Collective Outcomes

-20

0

20

40

60

80

100

120

0 50 100 150

% S imilar W anted ( I ndividual)

% S

imil

ar

Ach

iev

ed

(S

ocia

l)

% similar% unhappy

Counter-intuitive macro (social) results from simple micro interactions. A non-linear system.

Page 8: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Deconstructing this example• Clearly unrealistic in many senses: Property values, decision processes,

unstructured space, communication, neighbourhood knowledge.

• However, not unrealistic in important sense that simulation contains no arbitrary parameters and agents operate on plausible local knowledge. The only “parameters” in the model are individual PP values (measure by experiment? Already in surveys: Mare.)

• The simulation also generates unintended consequences (PP=1) and patterns that were not “built in”. For example, is the distribution of empty sites random or buffering? This emergence allows the possibility of genuine falsification and has heuristic fertility: What does compatibility of desires mean? When does it occur?

• We need two sorts of data: Quantitative (what patterns are we trying to explain?) and qualitative (what social processes create these?)

Page 9: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Aside …• It is very clear that we need the “complexity approach” because

we are not very good at deducing how complex systems work “in either direction” (micro to macro or vice versa).

• But what is the complexity approach in this context? Is it a set of methods, a set of subject areas, a family of interesting models/results, a way of looking at problems or all of the above?

• How does “the complexity approach” compare with “the sociology approach” or “the physics approach?”

• Should complexity be more than simulation calibrated on real data? If so, what?

• IMO, the main problem with complexity is “where’s the data?”

Page 10: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Quantitative data collection approach• Collect survey data: Cross sectional, time series or whatever.

• Choose a model and accept/reject it on grounds of statistical fit.

• Model coefficients are “results” conditional on acceptable model.

• In what sense do models explain observed patterns? (If we find a correlation between income and academic success of a particular size, what have we really learnt?)

• Technical problems: Explanatory range depends on sample size.

• Basic problem doesn’t go away even with “fancier” techniques like time series/multi-level modelling: A description isn’t an explanation.

• Rarely heuristically fertile.

Page 11: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Deriving a quantitative coefficient

Number ofstrikes(units)

Unemployment (millions)

1 2

50

80

Page 12: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Quantitative example• “The most important empirical findings of this study can be summarized as

follows:

• … there is a moderate tendency for individuals with higher service class origins to be more likely than others to enrol in PhD programmes.

• …

• The estimated effect of class drops to zero when controlling for parents’ education and employment in research or higher education.

• The overall implication of these findings is that the transition from graduate to doctoral studies is influenced by social origins to a considerable degree. Thus, the notion that such effects disappear at transitions at higher educational levels - due either to changes over the life course or to differential social selection - is not supported.” (Mastekaasa, Acta Sociologica, 2006, 49(4), pp. 448-449.)

Page 13: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Translating back into simulation …• Agents start with particular attributes (like being red or green and

having a particular PP in Schelling). These might include things like IQ and motivation.

• They undergo a long sequence of social interactions in institutional contexts, being influenced by parents, peers and teachers in classroom, playground, public library and so on. They also make choices and operate within institutional contexts (like rules for “streaming” or school allocation by catchment).

• The quantitative approach described here tries to link “late” attributes (starting a PhD) to “early” ones (parental occupation) in the hope that regularities in social life support this.

• Is parental occupation an attribute or a process?

Page 14: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Qualitative data collection approach• Collect data (cognitive, behavioural, structural) by observation

and questioning.

• Try (though surprisingly rarely) to induce a pattern from the data: Example of the “addiction cycle” and compare with amount (frequency) and type account of drug use.

• Result is rich coherent narrative(s): What heroin addiction means from the inside and in a particular context.

• Are the results generalisable? (What is N?)

• Can we correctly envisage the overall consequences of complex social interaction sequences presented using narratives? (Compare Schelling case again.)

• Often heuristically fertile (“addiction cycle”).

Page 15: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Qualitative example• “Turkish interviewees do not include themselves when they are evaluating the

status of ‘Turkish women’ in general. While referring to ‘Turkish women’, most Turkish interviewees use the pronoun ‘they’:

• Turkish women are more home-oriented. I think that they are left in the backstage because they do not have education, because they are not given equal opportunities with men. (T3)

• One of the Turkish interviewees stated that it was difficult for her to answer the questions related to her status ‘as a woman’, because:

• I don’t think of myself as a Turkish women, but as a Turkish person. I mean I never think about what kind of role I have in the society as a woman. (T1)

• Most Norwegian interviewees, on the other hand, identify with ‘Norwegian women’ in general, and they refer to ‘Norwegian women’ as ‘we’:

• I think that in a way Norwegian women, that is we, at least have our rights on paper.

We have equal rights for education and we have good welfare arrangements … (N1)” (Sümer, Acta Sociologica, 1998, 41(1), p. 122)

Page 16: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Translating back into simulation …• Agents choose “appropriate” actions on the basis of perceived

identity.

• A range of identities is “given” to agents by biological difference (skin colour) and social structure (“mother”, “worker”).

• Identities are made more salient by patterns of social interaction and socialisation. For example, perhaps a Turkish upbringing stresses female identities that are traditional (mother) or liberal (worker) and de-stresses the existence of a separate “woman’s identity” while a Norwegian upbringing stresses that identity as the underpinning of both work and child-rearing.

• Clearly this simulation needs to be much more cognitive, contextual and detailed than the Schelling example.

Page 17: The Complexity of Data: Computer Simulation and “Everyday” Social Science

What is going on here?

• Qualitative research tells us how people interact and make decisions within environments but can’t usually tell us what large scale patterns result.

• Quantitative research tells us what the large scale patterns are but may not really explain them. (Inability to reason about complexity may result in naïve attribution i. e. clusters are evidence of xenophobia.)

• Simulation shows how we might bridge the gap between the levels of description with a “generative” social theory expressed as a computer programme. (Coleman “boat”.)

Page 18: The Complexity of Data: Computer Simulation and “Everyday” Social Science

How are we doing with complexity?

• Large number of elements which interact dynamically.

• Interaction rich (mutual influence between significant numbers of elements).

• Non-linearity.

• Interaction short range and each element ignorant of the behaviour of the system as a whole. [2OE on clusters?]

• Interaction loops.

• Open system far from equilibrium requiring energy input. [?]

• Has a history.

• Source: Compressed losslessly from Cilliers, Complexity and Postmodernism, pp. 3-5.

Page 19: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Different kinds of “difficulty”

• Difficult patterns: Chaos, self-organised criticality. (Mathematical strand: We are studying formal systems, we don’t need data.)

• Difficult mental processes: Reflexivity, self-awareness, subconscious motives. (Social theory strand: We are too embedded in these systems and our reflections on them to bracket anything off as objective data.)

• Difficult social systems: Rich context, negotiated roles, complex artefacts. (Ethnographic strand: The world is too complex for general theories.)

Page 20: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Degrees of similarity in Schelling• Predict exact positions of clusters?

• Predict that there will be clusters at all?

• Predict spatial stability of clusters?

• Predict the size distribution (or separation) of clusters?

• Predict (for three “types”) that clusters will be separated/nested?

• Predict that most cosmopolitan agents will form perimeters of clusters?

• Predict that empty sites will be randomly distributed for cosmopolitan agents but form buffer zones for more xenophobic agents? (“Looking at the holes”: A heuristic idea, “vacancy chains”.)

Page 21: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Ideal simulation methodology• Choose a target system: Ethnic segregation in cities.

• Build a simulation of the target system and calibrate it, typically on micro level data: Ethnography and experiments? How do agents make relocation decisions and where do they go?

• Run simulation and look for regularities and their preconditions: Do we observe clusters (always, never, only with high PP, fixed, identical, moving) and buffer zones?

• Compare these regularities with statistical data on real residential patterns. What effective similarity tests do we have?

• If there is a “good” match then we haven’t yet falsified the claim that the simulation “generates” the target system and therefore explains it (a progressive process of course).

Page 22: The Complexity of Data: Computer Simulation and “Everyday” Social Science

The Gilbert and Troitzsch “box”

Page 23: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Case Study I: DrugChat

• A reimplementation of Agar’s DrugTalk for the DTI Foresight Programme.

• Based on ethnographic data but generates some qualitatively realistic aggregate data.

• Problematises both the “attribute” based approach to social regularity and the “transition probability” based approach to modelling.

Page 24: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Assumptions I

• Networks: Many have few ties and few have many.

• Types: Non-users, users and addicts. (Distinguished by patterns of behaviour not level of use.)

• Choice based on attitudes to risk (fixed and normally distributed around 50) and to drugs (varies by experience and social influence initialised at 50).

• System driven by “arrival” of drug doses: Addicts get few doses with high probability, users get more doses with lower probability and non-users get few doses with very low probability.

Page 25: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Assumptions II

• Choice simply compares ATR and ATD (but addicts don’t choose).

• “Stash”: Users share all bar one dose with friends (“partying”) while addicts don’t share.

• Drug use experience evaluated on each dose and can be good and bad. Counts kept of these update ATD. Early experiences have more impact than late ones and bad experiences more impact than good.

• After 5 doses, addiction occurs (physiology).

Page 26: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Assumptions III

• Addict communication is ignored but status as addicts has strong negative effect on friends.

• Current users have a direct “congruence” influence via drug experience (good or bad).

• Non current users and non users only influence slightly through “gossip” - telling “drug stories” (total counts of good and bad experiences across all friends used to update ATD).

• Clearly a complicated system: Is it a complex one?

Page 27: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Aggregate properties

Page 28: The Complexity of Data: Computer Simulation and “Everyday” Social Science

The statistical approach

Page 29: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Reading these outputs

• Producing an “S curve” is very weak support for the simulation assumptions. Too many other assumption sets produce it too. (Back to issue of qualitative similarity.)

• Because this simulation is only broadly empirical, the failure to predict user status on ATR does not “disprove” the statistical approach. It only shows how systems at a particular level of “complicatedness” (in fact not very high) may break down relationships between attributes which statistical approaches rely on.

Page 30: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Aside …

• The Caulkins model also has three states: User, non-user and addict and assumes that there are fixed transition rates between states.

• These TRs are for NU to U, U to A, U to NU and A to NU. The only behavioural restriction on the TRs is that A to NU is assumed to be smaller than U to NU.

• This model is fitted to real data.

• What happens if we use the DrugChat simulation to calculate transition probabilities of the Caulkins kind?

Page 31: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Transition probabilities in DrugChat

Page 32: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Reading this output• Again, DrugChat is not calibrated well enough to prove that it

is “right” and Caulkins et al. are “wrong”.

• However, this output (not only are transition probabilities not constant but they change sign!) does suggest that constant transition probabilities are not likely to be a very effective approximation in social systems with even a rather low level of “complicatedness”. (The Caulkins model doesn’t even work in the simplified world of DrugChat.)

• Should we start asking questions about how likely different approaches are to work and how we would go about establishing this? (Hendry and model reductions.)

Page 33: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Simulated biography

Page 34: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Reading this output• Initially there is little information in the system. ATD=50.

• Then the agent has two bad experiences with drugs.

• By then, much gossip and experience is reporting good things about the drug which is true “on average” before its addictive nature is recognised.

• This promotes more use, each time with mixed results.

• Unfortunately by this point, addiction has kicked in.

• This particular agent becomes addicted despite several bad drug experiences via social influence.

Page 35: The Complexity of Data: Computer Simulation and “Everyday” Social Science

What are we doing here?• Collecting different kinds of data from the simulated

system which can be compared not only with real data but with underlying assumptions of various theoretical approaches (simple statistical models, models based on “stocks and flows”). Access to multiple kinds of data allows stronger falsification of methods and models.

• Reflecting (at least broadly) on where we might get the kinds of data we need to calibrate the model properly (behavioural, cognitive, physiological, institutional, structural) within the context of existing methods.

Page 36: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Why is this a good idea?• Simulated systems recognise and can represent

different kinds of social “difficulty” - which may include various things people intend by complexity (reflexivity, chaotic output) but also make their “ontological” status clearer. (Is this “difficulty” in the heads of individuals, in their processes of interaction or what?)

• However, unlike a lot of complexity theory (albeit for different reasons) there is an “old fashioned” commitment to integrating data and theory and to explaining across levels of description. This may work better using the new approach too.

Page 37: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Conclusions• Complexity needs to think very carefully about what “kind of

thing” it is if it is going to survive after the “fad” phase.

• Simulation has tools to offer the approach which (at least in principle) tap into the methods and data of social science. (I haven’t talked about the physical sciences but I think the some of the same arguments go through.)

• Simulation of Innovation: A Node (SIMIAN): ESRC funded under NCRM for three years with Professor Nigel Gilbert (Sociology @ Surrey) to train and do methodologically innovative research. A good time for collaboration?

Page 38: The Complexity of Data: Computer Simulation and “Everyday” Social Science

Now read on?• Gilbert and Troitzsch (2005) Simulation for the Social Scientist, second edition

(Open University Press). [Examples/resources online. All examples in NetLogo.]

• J. Artificial Societies and Social Simulation: http://jasss.soc.surrey.ac.uk/ [Free, fully peer reviewed, interdisciplinary and only online.]

• Chattoe (2006) ‘Using Simulation to Develop and Test Functionalist Explanations: A Case Study of Dynamic Church Membership’, British Journal of Sociology, 57(3), September, pp. 379-397.

• Chattoe and Hamill (2005) ‘It’s Not Who You Know – It’s What You Know About People You Don’t Know That Counts: Extending the Analysis of Crime Groups as Social Networks’, British Journal of Criminology, 45(6), pp. 860-876.

• Chattoe, Hickman and Vickerman (2005) Foresight: Drugs Futures 2025? Modelling Drug Use, Office of Science and Technology, Department of Trade and Industry. [Available from the presenter or online.]