Upload
duongtu
View
246
Download
0
Embed Size (px)
Citation preview
Insert presenter logo here on slide master. See hidden slide
4 for directions
Session ID:Session Classification:
Elie BurszteinStanford University
The art of breaking and designing captchas
HT2-402Intermediate
2
Insert presenter logo here on slide master. See hidden slide
4 for directions 2
3Elie Bursztein (@elie)http://elie.im
3
4Elie Bursztein (@elie)http://elie.im
World Most-Popular Captchas
4
5Elie Bursztein (@elie)http://elie.im
Captcha Design Goal
AI ?
Human
sweet spot
5
6Elie Bursztein (@elie)http://elie.im
Focus of this talk
xw
How to break and designCAPTCHAs
6
7Elie Bursztein (@elie)http://elie.im
Based on the analysis of 21 of the most popular schemes
7
8Elie Bursztein (@elie)http://elie.im
Outline
How to break text captchaHow to break audio captchaHow to make captchas easier for humanWhat’s next ?
8
9Elie Bursztein (@elie)http://elie.im
Evaluation metrics
AccuracyLearnability
9
Solving time
10
Insert presenter logo here on slide master. See hidden slide
4 for directions
How to Break Text-Captchas
10
11Elie Bursztein (@elie)http://elie.im
Think Lego
11
12Elie Bursztein (@elie)http://elie.im
Pre-processing: captcha binarizationPre-processing: background removalHow to break a captcha: examplePre-processing: Line detectionPre-processing: Line removalSegmentation: clustering algorithmSegmentation: cluster separationPost-segmentation: inverting rotationRecognition: 3173
12
13Elie Bursztein (@elie)http://elie.im
Breaker 5 Stages Pipeline
Preprocessing
Segmentation
Post-segmentation
Recognition f a e t e s t
f a s t e s tPost-recognition
Slashdot captcha
13
14
From the image to the matrix representation
15
L1 L2 L3 L4 L5 L6vector
From the matrix representation to the vector representation
16
B
B
A
A
C
C
42
40
32
70
12
18
vector
DistanceKnown vectors
C 12
From the vector representation to the segment value (classification)
17Elie Bursztein (@elie)http://elie.im
Breaker efficiency
Solver accuracy = Coverage * Precision^length
Coverage: Segmentation ratePrecision: Recognition rate
17
18Elie Bursztein (@elie)http://elie.im
Anti-recognition techniques
Blurring
Distortion
Rotation
Fonts
Charsets 0123456789
18
19Elie Bursztein (@elie)http://elie.im
SVM learning rate
19
20Elie Bursztein (@elie)http://elie.im
KNN learning rate
20
21Elie Bursztein (@elie)http://elie.im
Anti-recognition taxonomy
Background Confusion
Lines
Collapsing
B k d f i21
22Elie Bursztein (@elie)http://elie.im
Breaking World of Warcraft
22
23Elie Bursztein (@elie)http://elie.im
Breaking Captcha.net
23
24Elie Bursztein (@elie)http://elie.im
Breaking Wikipedia
24
25Elie Bursztein (@elie)http://elie.im
Breaking Digg
25
26Elie Bursztein (@elie)http://elie.im
Breaking Slashdot
26
27Elie Bursztein (@elie)http://elie.im
Breaking eBay
27
28Elie Bursztein (@elie)http://elie.im
Failing to break eBay
28
29Elie Bursztein (@elie)http://elie.im
Breaking Baidu
29
30Elie Bursztein (@elie)http://elie.im
Segmentation rate
Solving rate
Authorize 84% 66%Baidu 98% 5%Blizzard 75% 70%Captcha.net 96% 73%
CNN 50% 16%Digg 86% 20%eBay 95% 43%Google 0% 0%MegaUpload n/a 93%
NIH 87% 72%Recaptcha 0% 0%Reddit 71% 42%Skyrock 30% 2%Slashdot 52% 35%Wikipedia 57% 25%
30
31Elie Bursztein (@elie)http://elie.im
Learning rate for real schemes
31
32Elie Bursztein (@elie)http://elie.im
Building a breaker guidelinesImmediate visual feedbackVisual debuggingAlgorithm independenceExposing algorithm parameters
32
33Elie Bursztein (@elie)http://elie.im
Decaptcha main interface
33
34Elie Bursztein (@elie)http://elie.im
Apply design principles
Core design principlesRandomize lengthRandomize character sizeWave the captcha
Use anti-recognition as a means of strengthening captcha securityDon’t use a complex charset
Bad for human (see our research on this)Useless for security
Use collapsing or lines
34
35
Insert presenter logo here on slide master. See hidden slide
4 for directions
Designing Better Captchas
35
36Elie Bursztein (@elie)http://elie.im
Think Lego againDecompose in featuresAnalyze
feature in isolationfeatures interaction
36
37Elie Bursztein (@elie)http://elie.im
Real vs Generated
37
38Elie Bursztein (@elie)http://elie.im
Real vs Generated
38
39Elie Bursztein (@elie)http://elie.im
Evaluation system
39
40Elie Bursztein (@elie)http://elie.im
Experiment details
40
41Elie Bursztein (@elie)http://elie.im
Some of the features tested
41
42Elie Bursztein (@elie)http://elie.im
Angle of rotation
42
43Elie Bursztein (@elie)http://elie.im
Collapsing
43
44Elie Bursztein (@elie)http://elie.im
Character size
44
45Elie Bursztein (@elie)http://elie.im
Resolution invariant
45
46Elie Bursztein (@elie)http://elie.im
2D interactions
46
47Elie Bursztein (@elie)http://elie.im
Length vs Angle interaction
47
48Elie Bursztein (@elie)http://elie.im
Perception Does Not Match Number
48
49
Insert presenter logo here on slide master. See hidden slide
4 for directions
How to Break Audio-Captcha
49
50Elie Bursztein (@elie)http://elie.im
Audio Captchas
50
51Elie Bursztein (@elie)http://elie.im
Super secure captcha
CaptchaMaker
Creating Audio Captcha
Noises Voices
51
52Elie Bursztein (@elie)http://elie.im
Noise intensity (RMS/SNR)
2 9 0 0Microsoft
JDigg
A K
Authorize
K J 5 H
52
53Elie Bursztein (@elie)http://elie.im
Sound representation
WAV DFT
Cep
TFR
TCR
TDC
53
54Elie Bursztein (@elie)http://elie.im
Solving an audio captcha
CT
T A R A FR
S2
54
55Elie Bursztein (@elie)http://elie.im
Dealing with random noiseStatistical learningSupervised learningRLS (Regularized least square) classifier
55
56Elie Bursztein (@elie)http://elie.im
Semantic noise
56
57Elie Bursztein (@elie)http://elie.im
Results
Length Coverage Digit Captcha
Authorize 5 100 97 89.2%
Digg 5 100 76 41.4%
eBay 6 85.6 92.5 82.9%
Microsoft 10 80.6 89.6 48.9%
Recaptcha 8 99.9 40.5 1.5%
Yahoo 7 99.1 74.7 45.4%
57
58Elie Bursztein (@elie)http://elie.im
Recaptcha semantic noise
58
59Elie Bursztein (@elie)http://elie.im
Confusion matrices
59
60Elie Bursztein (@elie)http://elie.im
How many captchas do you need ?
60
61Elie Bursztein (@elie)http://elie.im
Apply
Within 3 monthsMake sure you have a strong captcha schemeEnsure that your site is accessible
Within 6 monthsLog your captchas failure rate and monitor themHave a backup captcha scheme in case your scheme is broken
61
62Elie Bursztein (@elie)http://elie.im
Thank you !
Questions ?
Follow-me !
Thank you
Twitter: @elie
62
Captcha research: http://elie.im/captcha