27
The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists Umar Iqbal*, Zubair Shafiq*, and Zhiyun Qian† The University of Iowa* University of California-Riverside†

The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists

Umar Iqbal*, Zubair Shafiq*, and Zhiyun Qian†

The University of Iowa*University of California-Riverside†

Page 2: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Agenda

The Ad WarsOnline adsAdblockingAnti-AdblockingAnti-Anti-Adblocking

ContributionsAnti-Adblock filter list analysisRetrospective coverage analysisDetecting Anti-Adblock Scripts

Conclusion

2 Umar Iqbal

Page 3: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Online Advertising

Advertising enables free contentPublishers show free content

Earn revenue with ads

Problems with adsPrivacy

Intrusive

Malware

Performance

SolutionAdblocking

3 Umar Iqbal

Page 4: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Ad/Tracker Blocking Solutions

Ad/Tracker Blocking Browsers

Trackers Blocking Extensions

Adblocking Extensions

Privacy Badger & Ghostery

Adblock Plus & Adblock

Brave browser & Cliqz browser

4

Mainstream Ad/Tracker Blocking Browsers Apple Safari & Google Chrome

Umar Iqbal

Page 5: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

How do Adblockers Work?

5

Client

3rd Party Serverwww.example.com

3rd Party Content

Ads

Ad Server

HTTP Request

HTTP Response

1st Party Content

3rd Party Content

Block HTTP Requests

Block HTML Elements

Crowdsourced Filter Lists

EasyList

Disconnect.me

Umar Iqbal

Page 6: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Publishers vs Adblockers

Acceptable Ads ProgramWhitelisting fee

Transparency concerns

Enabled by default in major adblockers

Use of Anti-AdblockersInsert bait elements

Detect adblockers

Prompt to disable adblockers/whitelist website

6 Umar Iqbal

Page 7: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Anti-Adblocking

Block/allow bait HTTP requests

Hide/allow bait HTML elements

Use anti-adblocking filter lists

Anti-Adblock Killer

EasyList

7 Umar Iqbal

Page 8: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Agenda

The Ad WarsOnline adsAdblockingAnti-AdblockingAnti-Anti-Adblocking

ContributionsAnti-Adblock filter list analysisRetrospective coverage analysisDetecting Anti-Adblock Scripts

Conclusion

8 Umar Iqbal

Page 9: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Filter List Rules

HTTP Request Filter Rules

Domain anchor ||

Domain tag domain=

HTML Element Filter Rules

Domain restriction

Without domain restriction

Exception Rules

HTTP exception rules

HTML exception rules

! Rule with domain anchor

|| example.com

! Rule with domain tag

/example.js $script, domain = example.com

! Rule with domain restriction

example.com###examplebanner

! Rule without domain restriction

###examplebanner

! Exception rule for HTTP request

@@/example.js $script domain = example1.com

! Exception rule for HTML element

example.com#@##examplebanner

9 Umar Iqbal

Page 10: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Popular Filter Lists

Anti-Adblock Killer ( 2014 )

353 to 1,811 filter rules

6.2 filter rules for every revision

EasyList ( 2011 )

Anti-Adblock sections

67 to 1,317 filter rules

0.6 filter rules per day

Warning Removal List ( 2013 )

4 to 167 filter rules

0.2 filter rules per day

EasyList + Warning Removal List

Combined EasyList

10 Umar Iqbal

Page 11: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Adblock Killer vs Combined EasyList

Number of domainsAnti-Adblock Killer 1,415 Combined EasyList 1,394 Common domains 282

Similar distribution of Alexa ranking

Similar distribution for categories

Exception vs Non-Exception domainsCombined EasyList 4:1

Anti-Adblock Killer 1:1

11

Different Strategies of Crafting Anti-Adblocking Rules

Domain Categorization

Umar Iqbal

Page 12: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Adblock Killer vs Combined EasyList

282 common domainsPrompt in adding new rules

12

64% appear first in Combined Easylist

34% appear first in Anti-Adblock Killer

2% appear at the same time

Combined EasyList is More Prompt in Adding New Rules

Umar Iqbal

Page 13: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Agenda

The Ad WarsOnline adsAdblockingAnti-AdblockingAnti-Anti-Adblocking

ContributionsAnti-Adblock filter list analysisRetrospective coverage analysisDetecting Anti-Adblock Scripts

Conclusion

13 Umar Iqbal

Page 14: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

The Internet Archive’s Wayback Machine

Archives web pages

279 billion webpages

Archives webpage resources as well

Used in prior literature [USENIX Security ‘16]

API to retrieve content

Alexa top 5K websites

5 years (2011 – 2016)

Wayback Machine is incomplete!

robots.txt permissions

Partial snapshots

Outdated URLs

Not archived URLs

14

Missing Snaphots

Umar Iqbal

Page 15: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Analysis Workflow

Top 5K Alexa

domains

List of Wayback URLs

with timestamps Data

Repository

Filter list

matching

15

Remove not archived domains

Request to the WaybackMachine JSON API

Remove outdated URLs

Request Wayback Machine URLs with Selenium

Store requests/responses and HTML content

Match crawled content with anti-adblock filter lists

Remove partial snapshots

Umar Iqbal

Page 16: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Adblock Filter Lists Coverage

HTTP matching

HTML matching

Use respective filter lists

Anti-Adblock Killer filter list

Combined EasyList filter list

16

Number of websites that trigger HTTP rules

Number of websites that trigger HTML rules

331 Websites

16 Websites

5 Websites

4 Websites

Anti-Adblock Killer Filter List Has Better Coverage

Umar Iqbal

Page 17: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Adblock Filter Lists Coverage

Detection on the Live Web

Alexa top 100K

Anti-Adblock Killer

4,942 websites

Combined EasyList

195 websites

17

Anti-Adblock Killer Filter List Has Better Coverage on the Live Web

Umar Iqbal

Page 18: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Adblock Filter Lists Lag

Crowdsourced

Manually maintained

Challenging to keep pace

New rules within 100 daysCombined EasyList

Anti-Adblock Killer

18

82% Anti-Adblockers

32% of Anti-Adblockers

Combined EasyList is More Prompt in Adding New Rules

While Anti-Adblock Killer Has More Coverage

Umar Iqbal

Page 19: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Agenda

The Ad WarsOnline adsAdblockingAnti-AdblockingAnti-Anti-Adblocking

ContributionsAnti-Adblock filter list analysisRetrospective coverage analysisDetecting Anti-Adblock Scripts

Conclusion

19 Umar Iqbal

Page 20: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Static Code Analysis

Anti-Adblocking code from 3rd party vendors

Anti-Adblocking code have structural similarities

Static analysis to capture code structure

Fingerprint anti-adblocking JavaScriptCurtsinger [USENIX Security ’11]

Ikram [PETS ’17]

20 Umar Iqbal

Page 21: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Anti-Adblock Detection Workflow

JS file Unpacked

JS file

Anti-Adblocking JS

Non Anti-Adblocking JS

Extract features from ASTs and

filter features with low correlation

Construct ASTs from

Unpacked JavaScript CodeTrain AdaBoost using

SVM as base classifier

21

Unpack packed JavaScript files with V8 engine

Classify Anti-Adblocking and Non Anti-Adblocking JavaScripts

Umar Iqbal

Page 22: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

JavaScript Code Example

if (ad_element.clinetHeight == 0){

BlockAdBlock = "abp";

}

Feature Extraction

PreprocessingUnpack eval() using V8 Engine

Construct Abstract Syntax Tree (AST)

Features (context : text)All (AssignmentExpression:BlockAdBlock)

Literal (Literal:abp)

Keyword (Identifier:clientHeight)

Map scripts to a vector space

22 Umar Iqbal

∅ ∶ 𝑥 → ∅𝑠 𝑥 𝑠 ∈ 𝑆

∅𝑠 𝑥 = ቊ1, 𝑖𝑓 𝑥 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝑡ℎ𝑒 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑠0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Packed Code

eval( “ var BlockAdBlock = “abp”; ” );

Unpacked Code

var BlockAdBlock = “abp”;

Identifier ExpressionStatement

clientHeight ad_element BlockAdBlock abp

IfStatement

Page 23: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Feature Selection & Training

Labeled Data372 anti-adblocking

4021 non anti-adblocking

Feature selectionFilter using χ2 correlation

Reduce features

Classifier trainingAdaBoost + SVM

10 fold cross validation

23 Umar Iqbal

Page 24: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Results & Evaluation

Feature Set Classifier Number of

Features

TP rate (%) FP rate (%)

all AdaBoost + SVM 10K 99.6 3.9

literal AdaBoost + SVM 10K 99.6 3.9

keyword AdaBoost + SVM 1K 99.7 3.2

Results in term of True Positive (TP) rate

Correctly classified anti-adblocking scripts

False Positive (FP) rate

Incorrectly classified anti-adblocking scripts

Test in the wild on Alexa top 100K websites2,701 detected anti-adblockers

TP rate of 92.5%

Complement manual analysisPeriodic crawl to expedite manual process

Substantial reduction of manual effort

24 Umar Iqbal

Page 25: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Key Takeaways

Comprehensive measurement study of anti-adblocking filter listsRetrospective analysis on Alexa top 5K websites from 2011 to 2016

Effectiveness and evolution

Lightweight machine learning approachStatic analysis to detect anti-adblocking scripts

Complement filter lists rules creation

The Wayback Machine enables retrospective analysisCan be used to study similar filter lists

Malware, Tracking, Censorship

25 Umar Iqbal

Page 26: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

Questions?

Umar Iqbal

www.umariqbal.com

@umaarr6

Page 27: The Ad Wars - SIGCOMM · Anti-Adblock Filter Lists Lag Crowdsourced Manually maintained Challenging to keep pace New rules within 100 days Combined EasyList Anti-Adblock Killer 18

References

A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner. Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016. In USENIX Security Symposium, 2016.

C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection. In USENIX Security Symposium, 2011.

M. Ikram, H. J. Asghar, M. A. Kaafar, A. Mahanti, and B. Krishnamurthy. Towards Seamless Tracking-FreeWeb:ImprovedDetection of Trackers via One-class Learning . In Privacy Enhancing Technologies Symposium (PETS), 2017.

27 Umar Iqbal