22
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao UC Santa Barbara [email protected]

Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers

Gang  Wang,  Tianyi  Wang,  Haitao  Zheng,  Ben  Y.  Zhao    

UC  Santa  Barbara  [email protected]  

Page 2: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Machine Learning for Security

•  Machine learning (ML) to solve security problems –  Email spam detection

–  Intrusion/malware detection

–  Authentication

–  Identifying fraudulent accounts (Sybils) and content

•  Example: ML for Sybil detection in social networks

2

Training   Classifier  

Known  samples  

Unknown  Accounts  

Page 3: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Adversarial Machine Learning

•  Key vulnerabilities of machine learning systems –  ML models derived from fixed datasets

–  Assuming similar distribution of training and real-world data

•  Strong adversaries in ML systems –  Aware of usage, reverse engineering ML systems

–  Adaptive evasion, temper with the trained model

•  Practical adversarial attacks –  What are the practical constrains for adversaries?

–  With constrains, how effective are adversarial attacks?

3

Page 4: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Context: Malicious Crowdsourcing

•  New threat: malicious crowdsourcing = crowdturfing –  Hiring a large army of real users for malicious attacks

–  Fake customer reviews, rumors, targeted spam

–  Most existing defenses fail against real users (CAPTCHA)

4

Page 5: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Online Crowdturfing Systems

•  Online crowdturfing systems (services) –  Connect customers with online users willing to spam for money

–  Sites located across the glob, e.g. China, US, India

•  Crowdturfing in China –  Largest crowdturfing sites: ZhuBaJie (ZBJ) and SanDaHa (SDH)

–  Million-dollar industry, tens of millions of tasks finished 5

Customer  

Crowd  workers  

…  Crowdturfing  site   Target  Network  

Page 6: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Machine Learning vs. Crowdturfing

6

•  Machine learning to detect crowdturfing workers –  Simple methods usually fail (e.g. CAPTCHA, rate limit)

–  Machine learning: more sophisticated modeling on user behaviors o  “You are how you click” [USENIX’13]

•  Perfect context to study adversarial machine learning

1.  Highly adaptive workers seeking evasion

2.  Crowdturfing site admins tamper with training data by changing all worker behaviors

Page 7: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Goals and Questions

•  Our goals –  Develop defense against crowdturfing on Weibo (Chinese Twitter)

–  Understand the impact of adversarial countermeasures and the robustness of machine learning classifiers

•  Key questions –  What ML algorithms can accurately detect crowdturfing workers?

–  What are possible ways for adversaries to evade classifiers?

–  Can adversaries attack ML models by tampering with training data?

7

Page 8: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Outline

•  Motivation

•  Detection of Crowdturfing

•  Adversarial Machine Learning Attacks

•  Conclusion

8

Page 9: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

•  Detect crowdturf workers on Weibo

•  Adversarial machine learning attacks –  Evasion Attack: workers evade classifiers

–  Poisoning Attack: crowdturfing admins tamper with training data

Methodology

9

Classifier  

Training  Data  

Training  (e.g.  SVM)  

Poison  AMack  

Evasion  AMack  

Page 10: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Ground-truth Dataset

•  Crowdturfing campaigns targeting Weibo –  Two largest crowdturfing sites ZBJ and SDH

–  Complete historical transaction records for 3 years (2009-2013)

–  20,416 Weibo campaigns: > 1M tasks, 28,947 Weibo accounts

•  Collect Weibo profiles and their latest tweets –  Workers: 28K Weibo accounts used by ZBJ and SDH workers

–  Baseline users: snowball sampled 371K baseline users

10

Page 11: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Features to Detect Crowd-workers

•  Search for behavioral features to detect workers

•  Observations –  Aged, well established accounts –  Balanced follower-followee ratio –  Using cover traffic

•  Final set of useful features: 35 –  Baseline profile fields (9) –  User interaction (comment, retweet) (8) –  Tweeting device and client (5) –  Burstiness of tweeting (12) –  Periodical patterns (1)

11

Task-driven nature

Active at posting but have less bidirectional interactions

Page 12: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Performance of Classifiers

•  Building classifiers on ground-truth data –  Random Forests (RF) –  Decision Tree (J48) –  SVM radius kernel (SVMr) –  SVM polynomial (SVMp) –  Naïve Bayes (NB) –  Bayes Network (BN)

•  Classifiers dedicated to detect “professional” workers

–  Workers who performed > 100 tasks –  Responsible for 90% of total spam –  More accurate to detect the professionals è 99% accuracy

12

0%  10%  20%  30%  40%  50%  60%  

RF   J48   SVMr  SVMp   BN   NB  

False  Posi\ve  Rate  False  Nega\ve  Rate  

Random Forests: 95% accuracy

Page 13: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Outline

•  Motivation

•  Detection of Crowdturfing

•  Adversarial Machine Learning Attacks

–  Evasion attack

–  Poisoning attack

•  Conclusion

13

Page 14: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

14

Classifier  

Training  Data  

Training  (e.g.  SVM)  

Evasion  AMack  

Detec\on  Model  Training  

Page 15: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Attack #1: Adversarial Evasion

15

•  Individual workers as adversaries –  Workers seek to evade a classifier by mimicking normal users

–  Identify the key set of features to modify for evasion

•  Attack strategy depends on worker’s knowledge on classifier

–  Learning algorithm, feature space, training data

•  What knowledge is practically available? How does different knowledge level impact workers’ evasion?

Page 16: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

A Set of Evasion Models

•  Optimal evasion scenarios –  Per-worker optimal: Each worker has perfect

knowledge about the classifier

–  Global optimal: knows the direction of the boundary

–  Feature-aware evasion: knows feature ranking

•  Practical evasion scenario –  Only knows normal users statistics

–  Estimate which of their features are most “abnormal”

16

?  ?  

?  ?  

Prac\cal  

Op\mal  

Classifica\on  boundary  

Page 17: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Evasion Attack Results

17

0  

20  

40  

60  

80  

100  

0   10   20   30  Worker  E

vasion

 Rate  (%

)  

Number  of  Features  Altered  

J48  

SVMp  

RF  

SVMr  

•  Evasion is highly effective with perfect knowledge, but less effective in practice

•  Most classifiers are vulnerable to evasion –  Random Forests are slightly more robust (J48 Tree the worst)

99% workers succeed with 5 feature changes

Op@mal  AAack  

0  

20  

40  

60  

80  

100  

0   10   20   30  Worker  E

vasion

 Rate  (%

)  Number  of  Features  Altered  

J48  

SVMp  

RF  

SVMr  

Prac@cal  AAack  

No  single  classifier  is  robust  against  evasion.  The  key  is  to  limit  adversaries’  knowledge  

Need to alter 20 features

Page 18: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

18

Classifier  

Training  Data  

Training  (e.g.  SVM)  

Poison  AMack  

Detec\on  Model  Training  

Page 19: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Attack #2: Poisoning Attack

•  Crowdturfing site admins as adversaries –  Highly motivated to protect their workers, centrally control workers

–  Tamper with the training data to manipulate model training •  Two practical poisoning methods

–  Inject mislabeled samples to training data è wrong classifier

–  Alter worker behaviors uniformly by enforcing system policies è harder to train accurate classifiers

19

Injec\on  AMack    

Inject  normal  accounts,    but  labeled  as  worker  

Wrong  model,  false  posi\ves!  

Altering  AMack  Difficult  to  classify!  

Page 20: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Injecting Poison Samples

•  Injecting benign accounts as “workers” into training data –  Aim to trigger false positives during detection

20

0  

5  

10  

15  

20  

0   0.2   0.4   0.6   0.8   1  

False  Po

si@v

e  Ra

te  (%

)  

Ra@o  of  Poison-­‐to-­‐Turfing    

Tree  

SVMp  

RF  

SVMr  J48-Tree is more vulnerable than others

Poisoning  aMack  is  highly  effec\ve  More  accurate  classifier  can  be  more  vulnerable    

10% of poison samples è boost false positives by 5%

Page 21: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Discussion

•  Key observations –  Accurate machine learning classifiers can be highly vulnerable

–  No single classifier excels in all attack scenarios, Random Forests and SVM are more robust than Decision Tree.

–  Adversarial attack impact highly depends on adversaries’ knowledge

•  Moving forward: improve robustness of ML classifiers

–  Multiple classifier in one detector (ensemble learning) –  Adversarial analysis in unsupervised learning

21

Page 22: Man vs. Machine: Adversarial Detection of Malicious ... · Machine Learning for Security • Machine learning (ML) to solve security problems – Email spam detection – Intrusion/malware

Thank You! Questions?

22