Upload
a-e-miller
View
159
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An examination of how behavioral analytics can be leveraged to design better defenses in complex user-facing platforms.
Citation preview
A Million Mousetraps Using Big Data and Little Loops to Build Better DefensesAllison Miller
Overview
Protecting customers on an open platform
Big data + Little loops enable automation via analytics
Decisions as defenses
Putting your data to work
the interdependent system
the porous attack surface
so, about that perimeter...
Spam !
!
Credential Theft
Malware
Bots
Account takeover Fraud
DOS
Phishing
Griefers
Scammers
The Better Mousetrap
Automates defensive action x-platform
- Fast
- Accurate
- Cheap
IN REAL TIMEIN TIME TO MINIMIZE LOSS
REASONABLE FALSE POSITIVESAS GOOD AS A HUMAN SPECIALIST
REDUCES MORE LOSS THAN COST CREATEDCHEAPER THAN MANUAL
INTERVENTION
BIG DATA &LITTLE LOOPS
BIG DATA &LITTLE LOOPS
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"![Tue Mar 9 22:02:41 2004] [info] created shared memory segment #10813446![Tue Mar 9 22:02:41 2004] [notice] Apache/1.3.29 (Unix) mod_ssl/2.8.16 OpenSSL/0.9.7c configured -- resuming normal operations![Tue Mar 9 22:02:41 2004] [info] Server built: Mar 7 2004 13:38:59!pausing [http://xmlrevenue.com/s.php?username=jenneypan&keywords=Online+Gambling] for 50000 ms![Tue Mar 9 22:04:16 2004] [error] [client 218.93.92.137] mod_security: Access denied with code 200. Pattern match "Basic" at HEADER.![Tue Mar 9 22:07:16 2004] [error] [client 203.121.182.190] mod_security: Invalid character detected [4]!123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"![Tue Mar 9 22:02:41 2004] [notice] Accept mutex: sysvsem (Default: sysvsem)![Tue Mar 9 22:03:26 2004] [error] [client 218.93.92.137] mod_security:![Tue Mar 9 22:07:16 2004] [error] [client 203.121.182.190] mod_security: Invalid character detected [4]!123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"![Tue Mar 9 22:02:41 2004] [notice] Accept mutex: sysvsem (Default: sysvsem)
BIG DATA &LITTLE LOOPS
BIG DATA &LITTLE LOOPS
* Loop Disposition: Logic, Human, or Other?
APPLIED RISK ANALYTICSUse of technology, data, research &
statistics to solve problems associated with losses or costs due to
security vulnerabilities / gaps in a system -- resulting in the deployment of optimized
detection, prevention, or response capabilities.
BRIEF TANGENT
WHAT IS THE DIFFERENCE BETWEEN RISK ANALYTICS
AND RISK METRICS?
METRICS ANALYTICS
Such as...Metrics Analytics
$ Loss Txns Purchase trends of high loss users
# Compromised Accts IP Sources of bad login attempts
% of Spam Messages Delivered
Spam subject lines generating most clicks
Minutes of downtime Most process-intensive applications
# Customer Contacts Generated
Highest-contact exception flows
YMMV
END TANGENT
Applied where?Where risks manifest in observable behavior
Where system owners make decisions
Where controls can be optimized by better recognizing identity, intent, or change
Decisions, Decisions
Authorize Block
Good false positive
Bad false negative
RESPONSE
POPULATION
Incorrect decisions have a cost Correct decisions are free (usually)
Good Action Gets
Blocked
Bad Action Gets
Through
Downstream Impacts
BIG DATA &LITTLE LOOPS
Why are you picking on me?Boo-yah! Still
getting away with it.
<Sigh> Nobody
understands me.
Such as...Populations- Users, Transactions, Messages, Packets, API calls,
Files
Actions- Allow, Block, Challenge, Review, Retry, Quarantine,
Add privileges, Upgrade privileges, Make Offer
Costs- Fraud, Data leakage, Customer churn, Customer
contacts, Downstream liability
Applying Decisions
Risk management is decision management
ACTOR ATTEMPTS
ACTIONSUBMIT
WHAT IS THE REQUEST
HOW TO HONOR THE REQUEST
SHOULD WE HONOR?
RESULT ACTION OCCURS
For example:ACTOR
ATTEMPTS PAYMENT
p (actor attempting payment is
accountholder)
Decision
Authorize
Review
Refer
Request Authentication
Decline
f(variable A + Variable B + ...)
SUBMIT
Flavors of Risk Models
I deviate significantly from a normal (good)
pattern
I summarize a known bad pattern
fa(x), fb(x), fc(x) fq(x), fr(x), fs(x)
What is normal?
http://en.wikipedia.org/wiki/Normal_distribution
WHAT IS BAD? WHAT IS GOOD?
Study history...Who
What
Where
When
Why
And then?
Study history...User IP Country
<> Billing Country
Buying prepaid mobile phones
Add new shipping address in cart
HoweverBuyer =
Phone reseller, static machine
ID
How much $$ is at risk?What is “normal” for this customer?What “bad” profiles does this match?
SHALL WE PLAY A GAME?(SINCE WE CAN’T PLAY “CLUE” FOR EVERY LOGIN
TRANSACTION NEW USER MESSAGE
FRIEND REQUEST ATTACHMENT
PACKET WINK POKE CLICK
BIT
WE BUILD RISK MODELS)
Model Development Process
Target -> Yes/No questions best
Find Data, Variable Creation -> Best part
Data Prep -> Worst part
Model Training -> Pick an algorithm
Assessment -> Catch vs FP rate
Deployment -> Decisioning vs Detection
User IP Country
<> Billing Country Buying prepaid mobile phones
Add new shipping address in cart
Buyer = Phone reseller, static machine
ID
How much $$ is at risk?What is “normal” for this customer?What “bad” profiles does this match?
GEOLOCATE IP
CONVERT GEO TO COUNTRY
CODE
FLAG ON MISMATCH
CART CATEGORY
MERCH RISK LEVEL
DATE ADDED
ADDRESS TYPE
STRING MATCHING
CUSTOMER PROFILE
DEVICE IDDEVICE HISTORYTXN-$-AMT
CHURN RISK, CLV, ...TXNS, LOGINS, ...
STOLEN CC, COLLUSION
Model TrainingSome algorithms:- Regression: Determines the best equation describe
relationship between control variable and independent variables
Linear Regression: Best equation is a lineLogistic Regression: Best equation is a curve (exponential properties)
- Bayesian: Used to estimate regression models, useful when working w/small data sets
- Neural Nets: Can approximate any type of non-linear function, often highly predictive, but doesn’t explain the relationship between control and independent variables
LOGISTIC <DEPVAR> <VAR1> <VAR2>...
P-VALUE OF SIGNIFICANCE, THROW OUT IF > .05
VARIANCE IN DEPENDENT VARIABLE EXPLAINED BY INDEPENDENT VARIABLES
DEPENDENT VARIABLE
INDEPENDENT VARIABLES
FACTOR ODDS OF DEPENDENT GO UP WHEN
INDEPENDENT VAR INCREMENTED
P-VALUE SHOULD BE < SIGNIFICANCE
LEVEL (.05)
GAIN
More gain/lift = more efficient predictions
Catch as much as possible (as much of the “bads”)
Minimize the overall affected
Target
In the end, we only hit what we aim at
And now an example
Everyone loves a good 419 scam
419 example: the 411Trigger - Contact receives 419 from a (free) business email
account, who contacts victim OOBBacktrack- Password was changed (user had to go through
reset process)- Contacts, inbox, outbox deleted- Nigerian IP login
Elaboration- “Reply-to”: changed an “i” to an “l” (same ISP)- Only takes Western Union
419 example: with love, from Abuja
What is the question? - p(ATO)- p(Spam:scam)- p(Fake acct creation)
What are our available answer/action sets?
What else can we do to detect/mitigate?
419 example: Reducing 911sVariables - “New” session variables: New login IP, new login IP country, new
cookie/machine ID- “Change” account variables: Change password, change secondary
email, change name, change public profile- “New” activity variables: Send to all contacts, # of accounts in “cc”
or “bcc”, Edit/delete contacts en masse- Association variables: New recipients, New “reply-to” fields,
“Similar” accounts created/associated (fuzzy=more difficult)User empowerment- Stronger password reset options (SMS)- Transparency: Other current sessions, past session history (IPs,
logins) - Auto-logout all other sessions upon password reset- Reporting: Details of elaboration as well as cut and paste messages
RecapProtecting customers requires understanding not just technology but also behavior. This requires:- Activity data
- Clear definitions of “good” vs “bad” results
- Constant feedback
- Analysis
Designing data-driven defenses- Decisions that can be automated w/data
- Where/what data sets to use
- Business drivers to keep in mind
An example
BIG DATA &LITTLE LOOPS
p (bad)
f(variable A + Variable B + ...)
Prediction is very difficult, especially about the future
Niels Bohr
Allison Miller @selenakyle