Improving the Privacy of Wireless Protocols Jeffrey Pang
Carnegie Mellon University
Slide 2
2
Slide 3
3 Protocol Header Protocol Control Info
Slide 4
What can Protocol Control Info Reveal? 4
www.bluetoothtracking.org Location traces can be deanonymized
[Beresford 03, Hoh 05-07, Krum 07] Kims House
Slide 5
Who Might be Tracking You? 5
Slide 6
Talk Overview Motivation Quantifying the tracking threat
Building identifier-free protocols Other research 6
Slide 7
Building identifier-free protocols Quantifying the tracking
threat My work: How to build efficient protocols that reveals no
transmitted bits to eavesdroppers [MobiSys 08 Best Paper, HotNets
07] Previous work: Temporary device addresses [Gruteser 05, Hu 06,
Jiang 07, Stajano 05] My work: Temporary addresses are not enough;
Other protocol info can be used to track devices [MobiCom 07, HotOS
07] Talk Overview 7
Slide 8
Name: Bobs Device Secret: Alice
PHY Layer Signatures Charlie -> AP Alice -> AP Charlie
-> AP ??? -> AP Physical layer fingerprints [Brik 08, Patwari
07, Hall 04] Require uncommon and/or expensive hardware Not as
accurate in all circumstances These may be obscured by adding
analog noise in hardware SlyFi raises the bar and is a necessary
first step 56
Slide 57
Linking with Signal Strength Side-channel attack accuracy
degrades significantly even if attacker tries to use signal
strength to link packets Side-channel attack accuracy degrades
significantly even if attacker tries to use signal strength to link
packets Attack: website finger-printing [Liberatore 06] Attacker
has 5 nodes to record packets RSSIs Attacker uses k-means
clustering to determine which packets belong to each client. Set of
RSSIs is the feature vector. Varying RSSI by +/- 5db reduces
accuracy even further to 30% [Bauer 08] 57
Slide 58
Why not Time for Data Transport? Data messages: Frequent: sent
often to deliver data (1000+ pkts/sec) Wide interface: many
applications, many side-channels Linkability at short timescales is
NOT usually OK Can NOT use loosely synchronized time to synchronize
i TiTi AB TiTi TiTi TiTi = = = 58
Slide 59
Other Protocol Details Broadcast All broadcast packets routed
through the AP Use same shared key for all the clients of the AP
Higher-layer binding Clients report pseudonym MAC address-to-IP
address bindings to AP AP answers all ARP queries Time
synchronization and roaming Use protected broadcast to transmit
timestamps, same BSSID info Coexistence with 802.11 Encapsulate
SlyFi in anonymous 802.11 frame with unused FC code Clients first
search for SlyFi AP, then fall back to non-private AP search
Link-layer ACKs If fast enough, just acknowledge last SlyFi token
sent Our software implementation uses windowed ACKs 59
Slide 60
Multi-Party Discovery 60 Search Probe Enc(K payload1, 0, )
MAC(K payload2, )AB T i T i = AES(K AB, i) token K payload, offset
TiTiTiTi AB T i T i = AES(K AB, i)AC token K payload, offset
TiTiTiTi AC... Enc(K AB,0, ) MAC(K AB, ) encrypt auth Enc(K AB,0, )
MAC(K AB, ) encrypt auth random key On receipt, check each 16-byte
block that could be a token 1500 byte packets at most 31 lookups
per packet Can be done in parallel in hardware What if there are
more than 31 receivers? Option 1: Duplicate packet with different
tokens in each Option 2: Approximate token matching; open problem
and future work
Slide 61
Packet Format Tryst: Discovery/Binding Shroud: Data Transport
TokenUnencrypted Message 61
Slide 62
Protocol Timing Diagram Discover Authenticate and Bind
Authenticate and Bind Send Data 62
Slide 63
Comparison protocols wifi-open:802.11 with no security
wifi-wpa: 802.11 with WPA PSK/CCMP public-key:straw man
symmetric-key:straw man armknecht: previous header encryption
proposal Performance Evaluation Similar 63
Slide 64
Link Setup Failures Encrypt everything fails to setup many
links 64
Slide 65
Token Computation Time Token computation time is negligible
(Once every token interval) Using software AES, 256 Mhz Geode
processor 65
Slide 66
Data Throughput vs. Packet Size SlyFi data transport overhead
is similar to WPA 66
Slide 67
Data Throughput SlyFi data filtering is about as efficient as
802.11 With simulated AES hardware Performs like symmetric key
Higher = Better 67
Slide 68
Empirical Stream Interleaving Many streams interleaved even at
short timescales 68
Slide 69
Empirical Background Probe Rate Background probes are frequent
in practice 69
Slide 70
Empirical # Probes per Client Some clients probe for many
network names 70
Slide 71
=== MOBICOM === 71
Slide 72
Tracking 802.11 Users Tracking scenario: Every users changes
pseudonyms every hour Adversary monitors some locations One hourly
traffic sample from each user in each location ? ? ? tcpdump Build
a profile from training samples: First collect some traffic known
to be from user X and from random others tcpdump..... ? ? ? Then
classify observation samples Traffic at 2-3PM Traffic at 3-4PM
Traffic at 4-5PM Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM
Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM 72
Slide 73
Sample Classification Algorithm Core question: Did traffic
sample s come from user X? A simple approach: nave Bayes classifier
Derive probabilistic model from training samples Given s with
features F, answer yes if: Pr[ s from user X | s has features F ]
> T for a selected threshold T. F = feature set derived from
implicit identifiers 73
Slide 74
Deriving features F from implicit identifiers Set similarity
(Jaccard Index), weighted by frequency: IR_Guest SAMPLE FROM
OBSERVATION Sample Classification Algorithm linksys djw SIGCOMM_1
PROFILE FROM TRAINING Rare Common w(e) = low w(e) = high 74
Slide 75
Evaluating Classification Effectiveness Simulate tracking
scenario with wireless traces: Split each trace into training and
observation phases Simulate pseudonym changes for each user X
DurationProfiled UsersTotal Users SIGCOMM conf. (2004) 4 days377465
UCSD office building (2006) 1 day153615 Apartment building (2006)
14 days39196 75
Slide 76
Question: Is observation sample s from user X? Evaluation
metrics: True positive rate (TPR) Fraction of user Xs samples
classified correctly False positive rate (FPR) Fraction of other
samples classified incorrectly Evaluating Classification
Effectiveness Pr[ s from user X | s has features F ] > T = 0.01
= ??? Fix T for FPR Measure TPR (See paper) 76
Slide 77
Results: Individual Feature Accuracy TPR 60% TPR 30% Individual
implicit identifiers give evidence of identity 1.0 77
Slide 78
Results: Multiple Feature Accuracy Users with TPR >50%:
Public: 63% Home: 31% Enterprise: 27% We can identify many users in
all environments PublicHomeEnterprise netdests ssids bcast fields
78
Slide 79
Results: Multiple Feature Accuracy Some users much more
distinguishable than others Public networks: ~20% users identified
>90% of the time PublicHomeEnterprise netdests ssids bcast
fields 79
Slide 80
One Application Question: Was user X here today? More difficult
to answer: Suppose N users present each hour Over an 8 hour day, 8N
opportunities to misclassify Decide user X is here only if multiple
samples are classified as his Revised: Was user X here today for a
few hours? 80
Slide 81
Results: Tracking with 90% Accuracy Many users can be
identified 81
Slide 82
=== WIFI-REPORTS === 82
Slide 83
Will Reports Improve Selection? Commercial APs at Seattle
hotspot locations red = official AP grey = other visible AP 83
Slide 84
=== OLD SLIDES === 84
Slide 85
Other Research Wireless Service Discovery [MobiSys 09, HotNets
07] Wi-Fi hotspot reputation system Private device discovery
without per-device bootstrapping Peer-to-Peer Games [SIGCOMM 08,
NSDI 06, IPTPS 07] Distributed File Systems [ICDCS 07] Internet
Measurement [IMC 04 x2, SIGCOMM 04] DNS infrastructure properties
DNS cache compliance BGP routing and multi-homing 85
Slide 86
What can Protocol Control Info Reveal? 86 Wardriving Database
Me
Slide 87
What can Protocol Control Info Reveal? 87 Wardriving Database
David Wetherall
Slide 88
Previous work: Periodically change device addresses [Gruteser
05, Hu 06, Jiang 07, Stajano 05] Central Challenge From:
19:1A:1B:1C:1D:1E Prevent third parties from tracking wireless
devices tcpdump 88
Slide 89
Central Challenge My thesis work: Temporary device addresses
are not enough; Other protocol info can be used to track devices
[MobiCom 07, HotOS 07] How to build efficient wireless protocols
that reveals no transmitted bits to eavesdroppers [MobiSys 08 Best
Paper, HotNets 07] Prevent third parties from tracking wireless
devices 89
Slide 90
Implicit Identifiers Example Implicit identifier: other
protocol fields Header bits (e.g., power management) Supported
transmit rates Supported authentication algorithms tcpdump ? time
Protocol Fields: 11,4,2,1Mbps, WEP Protocol Fields: 11,4,2,1Mbps,
WEP 90
Slide 91
Wireless Protocol Primer Same basic protocol for 802.11 ad-hoc
802.11 infrastructure Bluetooth Other wireless protocols 91
Slide 92
Wireless Protocol Primer Name: Bobs PSP Password: Alice