33
What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li , Lujo Bauer* *Carnegie Mellon University

What you want is not what you get:

  • Upload
    tender

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

What you want is not what you get:. Predicting sharing policies for text-based content on Facebook. Arunesh Sinha*, Yan Li † , Lujo Bauer* *Carnegie Mellon University † Singapore Management University. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: What you want is not what you  get:

What you want is not what you get:

Predicting sharing policies for text-based content on Facebook

Arunesh Sinha*, Yan Li †, Lujo Bauer*

*Carnegie Mellon University

†Singapore Management University

Page 2: What you want is not what you  get:

2

Motivation

Page 4: What you want is not what you  get:

4

More User Control ⇏ Better Privacy

o Users fail to comprehend controls

o Users fails to comprehend consequences

o Though concerned, often no effort towards better use of controls

Page 5: What you want is not what you  get:

5

More user controlSmarter user control

Our goal: Help users pick correct policy for new Facebook posts

Page 6: What you want is not what you  get:

Facebook Wall

Post n+1Facebook’s Strategy

Post n-2

Post n-1

Post n

Friends

Public

Public

Default:Public

Page 7: What you want is not what you  get:

Our Goal and Approach

Facebook Wall

Post n+1

Post n-2

Post n-1

Post n

Friends

Public

Public

Default:?

ML

Page 8: What you want is not what you  get:

8

Outline

o Data collection methodologyo Survey resultso Machine learning approacho Results and analysiso Limitations / Conclusion

Page 9: What you want is not what you  get:

9

Survey Methodology

o Created an online survey o Advertised on Craigslist and at CMU

Data Collection Method

Participate in a Carnegie Mellon research study on Facebook sharing. Earn $5 for participating in a ~20 minute online study.

We’re looking for English speaking adults, who have used Facebook for at least 4 months, update their Facebook status or post on Facebook at least every other day, and have used more than one privacy setting for their posts.Please click on the following link to start the online study: http://greyw1.ece.cmu.edu/survey/survey.phpUpon completion of the study, you will receive a $5 Amazon gift card.

Page 10: What you want is not what you  get:

Filtering UsersData Collection

Method

Page 11: What you want is not what you  get:

11

Survey Questions

o Collected demographic data– Age, gender, country, level of education

o Degree of agreement with the statements: – I have a strong set of privacy rules.– I find Facebook's privacy controls confusing.

o Have you ever posted something on a social network and then regretted doing it? If so, what happened? 

Data Collection Method

Page 12: What you want is not what you  get:

12

o Fetched 4 months of users’ posts

Facebook AppData Collection

Method

Policy

Text in post

Page 13: What you want is not what you  get:

13

Survey Results: Demographics

o 42 participants (avg. 146 posts and 4.6 policies)o Age: 18 to 65, with an average of 29.1o 35 female, 7 maleo 39 from USA Level of education

High SchoolCollegeAdvanced

Survey Results

Page 14: What you want is not what you  get:

14

Survey Results: SentimentSurvey Results

Regretted posting ever? (No/Yes)

Find privacy control confusing

Have a strong set of privacy rules

0% 20% 40% 60% 80% 100%

DisagreeNeutralAgree

Page 15: What you want is not what you  get:

ML Usage Plan

Facebook Wall Post n+1

Post n-2

Post n-1

Post n

Friends

Public

Public

Default:?

ML

Page 16: What you want is not what you  get:

16

Machine Learning

o We use MaxEnt as the ML tool– Used Stanford NLP software

o MaxEnt: provides good generalization– I.e., prevents overfitting– Learns probabilistic hypothesis h that outputs

probability over labels given data x– Chooses hypothesis h with maximizes entropy

• Subject to a form of agreement with training data

Machine Learning Approach

Page 17: What you want is not what you  get:

17

Features Considered

o Words and 2-grams in the Facebook posto Presence of multimediao Time of day – morning, evening, nighto Previous post’s policy

o Model (feature set) chosen using cross validation

Machine Learning Approach

Page 18: What you want is not what you  get:

18

Temporal Testing o The data is temporalo Picked 10 posts randomly as test datao We simulate a real-world scenario

Test

Test

Train to predict

Train to predict

Machine Learning Approach

Time

Page 19: What you want is not what you  get:

19

Trainingo Cross-validation to choose featureso May have different model for different test point

Machine Learning Approach

Test

Test

Train to predict

Train to predict

Time

Page 20: What you want is not what you  get:

Baseline Approach

o Previous policy (Facebook’s approach)– Use the policy of the last post as the prediction

o Surprisingly, pretty good accuracy– 0.85 on average

Results and analysis

Page 21: What you want is not what you  get:

MaxEnt Accuracy

Technique AccuracyBaseline Previous Policy 0.85

MaxEnt 0.86

Results and analysis

Page 22: What you want is not what you  get:

Prediction Mismatch

o Problem: We are not predicting intended policy– Instead, predicting implemented policy

o Conjecture:– Implemented policy is often incorrect– Users just use Facebook’s default policy

Results and analysis

Page 23: What you want is not what you  get:

Ground Truth Collection

o Feedback on 20 randomly chosen posts– Provides ground truth (intended policy)

23

Results and analysis

All policie

sever used

Text of post

Page 24: What you want is not what you  get:

24

Datasets

Original data Clean data

Correct 20 posts basedon feedback

Prunedclean data

Remove 80%

Implemented Policy

Results and analysis

Page 25: What you want is not what you  get:

25

Temporal Testing o 20 intended policy knowno Picked 8 of these randomly as test datao We simulate a real-world scenario

Test

Test

Train to predict

Train to predict

Results and analysis

Page 26: What you want is not what you  get:

Baselineo Same previous policy approach as beforeo Measure intended accuracy

– Predict only for posts with known intended policy– Better measure of performance

o Baseline intended accuracy: 0.67– 0.85 obtained previously on implemented policies

Results and analysis

Page 27: What you want is not what you  get:

27

MaxEnt Intended AccuracyResults and

analysis

Baseline

67%

MaxEnt(clean)71%

MaxEnt(pruned clean)

81%

Page 28: What you want is not what you  get:

28

Confidence About PolicyConfidence Factor (CF): Fraction of posts for which intended policy matched implemented policy

Results and analysis

77

12

16

Users binned by confidence factor

0.00-0.250.26-0.500.51-0.750.76-1.00

Page 29: What you want is not what you  get:

29

Analysis of Improvement

0.00-0.25 (7)

0.26-0.50 (7)

0.51-0.75 (12)

0.76-1.00 (16)

00.10.20.30.40.50.60.70.80.9

1

BaselineMaxEnt (Clean)MaxEnt (Pruned Clean)

Confidence factor (#users)

Intended Accuracy

Results and analysis

Page 30: What you want is not what you  get:

30

Limitations

o Only 20 intended policy availableo 42 participants is not a huge number

– Other studies have used similar numbers

o Richer feature space possible– By processing the attachments of the post

o Could use more sophisticated ML techniques

Limitations

Page 31: What you want is not what you  get:

31

Conclusion

o Accuracy: 67% 81%o Accuracy for CF>0.5: 78% 94%

An approach demonstrating feasibility of learning intended

privacy policy of Facebook posts

Page 32: What you want is not what you  get:

32

Discarding “Bad” Data Helps

20% 40% 60% 80%0

0.10.20.30.40.50.60.70.80.9

1

Percentage of “bad” data discarded

Accuracy

Result and analysis

Page 33: What you want is not what you  get:

Improvement #Participants

0-0.25 0.26-0.50

0.51-0.75

0.76-1.00

02468

1012141618

#Partic-ipants#Im-provement Clean#Improve-ment Pruned Clean

Confidence factor

Number ofPartici-pants

Result and analysis