Upload
jessica-vitak
View
336
Download
0
Embed Size (px)
Citation preview
Beyond the Belmont Principles: Ethical
Challenges, Practices, and Beliefs in the Online Data
Research Community
Jessica Vitak, Katie Shilton, and Zahra AshktorabCollege of Information Studies, University of Maryland
CSCW 2016 | San Francisco, CATwitter: @jvitak, @katieshilton, @zashktorab
#cscw2016
Google processed more
than 20 PETABYTES of data per day
in 2014.
Source: Mashable
Data Free for All?
Not so fast (research) cowboy…
Let’s talk about data ethics first.
Belmont Report & Common Rule
Belmont Report published in 1979 in response to unethical studies in psychology & medicine
Further articulated in Common Rule legislation
Three main principles: 1. Respect for persons: participants know about and
consent to research2. Beneficence: do no harm; maximize benefits,
minimize risks3. Justice: fair distribution of costs/benefits to all
potential participants
Online data create gray areaIs it feasible to collect informed consent? Should you be more transparent about your research?Who is being left out by your data collection strategies?Isn’t public data public?Is it possible to truly anonymize a dataset?
Statement from the SIGCOMM 2015 Program Committee: The SIGCOMM 2015 PC appreciated the technical contributions made in this paper, but found the paper controversial because some of the experiments the authors conducted raise ethical concerns. The controversy arose in large part because the networking research community does not yet have widely accepted guidelines or rules for the ethics of experiments that measure online censorship. In accordance with the published submission guidelines for SIGCOMM 2015, had the authors not engaged with their Institutional Review Boards (IRBs) or had their IRBs determined that their research was unethical, the PC would have rejected the paper without review. But the authors did engage with their IRBs, which did not flag the research as unethical. The PC hopes that discussion of the ethical concerns these experiments raise will advance the development of ethical guidelines in this area. It is the PC’s view that future guidelines should include as a core principle that researchers should not engage in experiments that subject users to an appreciable risk of substantial harm absent informed consent. The PC endorses neither the use of the experimental techniques this paper describes nor the experiments the authors conducted.
Research Questions
1. What are the research ethics practices of researchers using online datasets?
2. What do researchers using online datasets believe constitutes ethical research?
3. How do these practices and beliefs vary among social computing researchers?
Method: Survey InstrumentSurvey items were created based on findings from 20 interviews with scholars in social-computing fields1
Items captured: Data Sources, Methods, and Analyses
employedData collection, analysis and sharing practicesAttitudes toward appropriateness of various
methods and practicesItems capturing researchers personal codes of
ethicsDemographics1 See Shilton & Sayles (2016) for more information on this initial qualitative work.
Method: Identifying population
Focused on researchers in social computing-related fields where online data analysis is common.Used databases to identify articles published since 2011 at eight conferences: CSCW, CHI, ICWSM, iConference, WWW, Ubicomp, CKIM, and KDD.
Search terms included: “trace ethnography,” “big data,” “twitter,” “forums,” “text mining,” “logs,” “activity traces,” and “social network”
Also posted survey invite to relevant listservs, including AoIR, AIS, CITASA, AIS ICA, STS, and NCA
Method: Participant Details
Based on database search, we compiled a list of ~2800 unique names.Sent initial invites plus two reminders through SurveyGizmo over two-week periodReceived 263 completed surveys, along with some interesting emails from researchers…
Variable Mean (SD)/N (%)
Sex MaleFemale
159 (60.5%)93 (35.4%)
Education Bachelor’sMaster’sPhD
15 (5.7%)61 (23.2%)180 (68.4%)
Current Location United StatesUKCanadaGermany23 other countries (<10 participants)
162 (61.6%)21 (8.0%)14 (5.3%)12 (4.6%)47 (17.8%)
Degree In Communication & MediaComputer Science/EngineeringHCIInformationSocial SciencesOther
33 (12.5%)100 (38%)10 (3.8%)50 (19%)
37 (14.1%) 27 (10.3%)
Current Field of Work
Academia (Research Focus)IndustryPolicy/Government/Non-Profit
207 (78.7%)35 (13.3%)12 (4.6%)
Sample Demographics (N=263)
Findings: Researcher Code of Ethics
Analyzed two ways:1. Iterative coding by authors of open-ended
responses to question, “How would you describe your personal code of ethics regarding online data?”*
2. EFA of 35 survey items relating to research attitudes and practices
*We received 162 (63%) responses, with an average length of 35 words (median=23.5; SD=35).
Code Definition Example StatementsPublic Data Only using public data / public
data being okay to collect and analyze
In general, I feel that what is posted online is a matter of public record, though every case needs to be looked at individually in order to evaluate the ethical risks.
Do No Harm
Comments related to Golden Rule
Golden rule, do to others what you’d have them do to you.
Informed Consent
Always get informed consent / stressing importance of informed consent
I think at this point for any new study I started using online data, I would try to get informed consent when collecting identifiable information (e.g. usernames).
Greater Good
Data collection should have a social benefit
The work I do should address larger social challenges, and not just offer incremental improvements for companies to deploy.
Established Guidelines
Including Belmont Report, IRBs Terms of Service, legal frameworks, community norms
I generally follow the ethical guidelines for human subjects research as reflected in the Belmont Report and codified in 45.CFR.46 when collecting online data.
Risks vs. Benefits
Discussion of weighing potential harms and benefits or gains
I think I focus on potential harm, and all the ethical procedures I put in place work towards minimizing potential harm.
Protect Participants
data aggregation, deleting PII, anonymizing / obfuscating data
I aggregate unique cases into larger categories rather than removing them from the data set.
Data Judgments
Efforts to not make inferences or judge participants or data
Do not expose users to the outside world by inferring features that they have not personally disclosed.
Transparency
Contact with participants or methods of informing participants about research
I prefer to engage individual participants in the data collection process, and to provide them with explicit information about data collection practices.
Emergent themes from qualitative responses: researchers’ personal code of ethics
Item M SD...notify participants about why they’re collecting online data1
3.89 0.96
...share research results with research subjects1 3.90 0.80
...Ask colleagues about their research ethics practices1 4.27 0.74
...Ask their IRB/internal reviews for advice about research ethics1
4.03 0.90
...Think about possible edge cases/outliers when designing studies1
4.33 0.71
...Only collect online data when the benefits outweigh the potential harms1
3.62 1.10
...Remove individuals from datasets upon their request1 4.56 0.71Researchers should be held to a higher ethical standard than others who use online data2
3.46 1.22
I think about ethics a lot when I'm designing a new research project2
3.96 0.93
Full Scale (α=.71) 4.00 0.491 Prompt: “I think researchers should....” 2 Prompt: “To what extent do you agree with the following statements?” Both sets of items were measured on five point, Likert-type scales (Strongly Agree-Strongly Disagree).
Codification of Ethical Attitudes Measure
Independent Variables ß (t-test) pSex: Female .01 (.13) .90Academic Degree Social Science (1) Computer Science (1)
-.06 (-.72)
-.13 (-1.45)
.48.15
Field of Work Academia (1) Industry (1)
-.16 (-1.23)-.31 (-2.43)
.22.02
Level of Education .01 (.07) .94Location: U.S. -.01 (-.20) .84F-test 1.56 .15Adjusted R2 .02 --
OLS Regression for Codified Ethics
*Industry workers’ average agreement with the codified ethics scale (M=3.75) was significantly lower than non-profit/policy workers (M=4.11). Academics (M=4.02) were not significantly different from either group.
Low variance (<.8) High variance (> 1.2)Agreement (>3.5)
Remove subjects from datasets upon request1
Ask (1) colleagues or (2) IRBs about research1
Share results with participants1
Think about edge cases/outliers1
Use non-representative samples2
Remove unique individuals before sharing1
Researchers can’t collect large-scale online data if consent is required
Disagreement (<3)
No items.
Notes: 1 “I think researchers should…” 2 “It’s permissible for researchers to…”
Ignore ToS when necessary2
Deceive participants2
Share raw data with key stakeholders1
It’s possible to obtain informed consent with large-scale studies
Comparing level of agreement and item variable across the corpus
Key TakeawaysContext matters. A lot.
Dependencies between ethical principles.
Beliefs & norms are often in flux.
Disciplinary differences?The data suggest no.Such assumptions risk
talking past each other rather than learning from others’ experiences
Ethics Heuristics for Online Data Research: Beyond the Belmont Report
1. Focus on transparency Openness about data collectionSharing results with community
leaders or research subjects
2. Data minimizationCollecting only what you need to
answer an RQLetting individuals opt outSharing data at aggregate levels
3. Increased caution in sharing results4. Respect the norms of the contexts in which online data
was generated.
Future directions for this research
This is first of three surveys by our research group. Each survey focuses on a different stakeholder:
1. Researchers (this study)2. IRBs (paper in progress)3. Users (study in development)
Other avenues for discussion and action: workshops (e.g., CSCW 2015, SIGCOM 2015)Panels (e.g., iConference 2016)Working groups/meetings with policymakers (e.g.,
Future of Privacy Forum, December 2015)
Thanks! Jessica Vitak Katie Shilton Zahra [email protected] [email protected] [email protected] @jvitak @katieshilton @zashktorab