16
Int. J. Inf. Secur. (2015) 14:205–220 DOI 10.1007/s10207-014-0256-7 REGULAR CONTRIBUTION Detection and analysis of eavesdropping in anonymous communication networks Sambuddho Chakravarty · Georgios Portokalidis · Michalis Polychronakis · Angelos D. Keromytis Published online: 18 August 2014 © Springer-Verlag Berlin Heidelberg 2014 Abstract Anonymous communication networks, like Tor, partially protect the confidentiality of user traffic by encrypt- ing all communications within the overlay network. How- ever, when the relayed traffic reaches the boundaries of the network, toward its destination, the original user traf- fic is inevitably exposed to the final node on the path. As a result, users transmitting sensitive data, like authentication credentials, over such networks, risk having their data inter- cepted and exposed, unless end-to-end encryption is used. Eavesdropping can be performed by malicious or compro- mised relay nodes, as well as any rogue network entity on the path toward the actual destination. Furthermore, end-to- end encryption does not assure defense against man-in-the- middle attacks. In this work, we explore the use of decoys at multiple levels for the detection of traffic interception by malicious nodes of proxy-based anonymous communication systems. Our approach relies on the injection of traffic that exposes bait credentials for decoy services requiring user authentication, and URLs to seemingly sensitive decoy doc- uments which, when opened, invoke scripts alerting about being accessed. Our aim was to entice prospective eaves- droppers to access our decoy servers and decoy documents, using the snooped credentials and URLs. We have deployed our prototype implementation in the Tor network using decoy S. Chakravarty (B ) · M. Polychronakis · A. D. Keromytis Columbia University, New York, USA e-mail: [email protected] M. Polychronakis e-mail: [email protected] A. D. Keromytis e-mail: [email protected] G. Portokalidis Stevens Institute of Technology, Hoboken, USA e-mail: [email protected] IMAP, SMTP, and HTTP servers. During the course of over 30 months, our system has detected 18 cases of traffic eaves- dropping that involved 14 different Tor exit nodes. Keywords Tor · Anonymity networks · Proxies · Eavesdropping · Decoys 1 Introduction Many services and protocols rely on non-encrypted commu- nication. Consequently, malicious users or organizations that have access to the network elements through which user traf- fic is routed can eavesdrop and obtain sensitive data, such as user authentication credentials. This situation can potentially worsen when users employ proxy-based systems to access the same services without using end-to-end encryption, as the number of hosts or nodes that can eavesdrop on their traffic increases. Various public and private networks may block access to social networking and other popular online services for various reasons. Under these conditions, users often resort to using distributed proxying systems to prevent their traffic from being filtered. They resort to such mecha- nisms so as to evade network traffic filtering based on source, destination, and content. Anonymous communication systems [1, 2, 22, 31, 44] are popular examples of proxy-based systems, which enable users to hide their IP address from the services they use, and often employ encryption by design. Systems such as Tor [22] route data through a series of proxies. Data packets are encrypted multiple times [16], so that if adversaries inter- cept the traffic en-route to the destination, they will not be able to determine the actual source or destination of the traf- fic. The process also aids in achieving confidentiality against eavesdropping adversaries who can observe the traffic and 123

Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Int. J. Inf. Secur. (2015) 14:205–220DOI 10.1007/s10207-014-0256-7

REGULAR CONTRIBUTION

Detection and analysis of eavesdropping in anonymouscommunication networks

Sambuddho Chakravarty · Georgios Portokalidis ·Michalis Polychronakis · Angelos D. Keromytis

Published online: 18 August 2014© Springer-Verlag Berlin Heidelberg 2014

Abstract Anonymous communication networks, like Tor,partially protect the confidentiality of user traffic by encrypt-ing all communications within the overlay network. How-ever, when the relayed traffic reaches the boundaries ofthe network, toward its destination, the original user traf-fic is inevitably exposed to the final node on the path. As aresult, users transmitting sensitive data, like authenticationcredentials, over such networks, risk having their data inter-cepted and exposed, unless end-to-end encryption is used.Eavesdropping can be performed by malicious or compro-mised relay nodes, as well as any rogue network entity onthe path toward the actual destination. Furthermore, end-to-end encryption does not assure defense against man-in-the-middle attacks. In this work, we explore the use of decoysat multiple levels for the detection of traffic interception bymalicious nodes of proxy-based anonymous communicationsystems. Our approach relies on the injection of traffic thatexposes bait credentials for decoy services requiring userauthentication, and URLs to seemingly sensitive decoy doc-uments which, when opened, invoke scripts alerting aboutbeing accessed. Our aim was to entice prospective eaves-droppers to access our decoy servers and decoy documents,using the snooped credentials and URLs. We have deployedour prototype implementation in the Tor network using decoy

S. Chakravarty (B) · M. Polychronakis · A. D. KeromytisColumbia University, New York, USAe-mail: [email protected]

M. Polychronakise-mail: [email protected]

A. D. Keromytise-mail: [email protected]

G. PortokalidisStevens Institute of Technology, Hoboken, USAe-mail: [email protected]

IMAP, SMTP, and HTTP servers. During the course of over30months, our system has detected 18 cases of traffic eaves-dropping that involved 14 different Tor exit nodes.

Keywords Tor · Anonymity networks · Proxies ·Eavesdropping · Decoys

1 Introduction

Many services and protocols rely on non-encrypted commu-nication. Consequently, malicious users or organizations thathave access to the network elements through which user traf-fic is routed can eavesdrop and obtain sensitive data, such asuser authentication credentials. This situation can potentiallyworsen when users employ proxy-based systems to accessthe same services without using end-to-end encryption, asthe number of hosts or nodes that can eavesdrop on theirtraffic increases. Various public and private networks mayblock access to social networking and other popular onlineservices for various reasons. Under these conditions, usersoften resort to using distributed proxying systems to preventtheir traffic from being filtered. They resort to such mecha-nisms so as to evade network traffic filtering based on source,destination, and content.

Anonymous communication systems [1,2,22,31,44] arepopular examples of proxy-based systems, which enableusers to hide their IP address from the services they use,and often employ encryption by design. Systems such asTor [22] route data through a series of proxies. Data packetsare encrypted multiple times [16], so that if adversaries inter-cept the traffic en-route to the destination, they will not beable to determine the actual source or destination of the traf-fic. The process also aids in achieving confidentiality againsteavesdropping adversaries who can observe the traffic and

123

Page 2: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

206 S. Chakravarty et al.

snoop out sensitive and reusable information such as usernames, passwords, and HTTP cookies.

In such systems, however, the last node on a path canaccess the original message that is being transmitted to theintended recipient. Many users are not aware of this discrep-ancy between the anonymity and privacy guarantees offeredby these systems, and the lack of end-to-end data confi-dentiality which is often mistakenly assumed. Disregardingthe absence of end-to-end confidentiality, users often sendsensitive information through these relays. Some of theserelays, acting with malicious intent, may misuse sensitiveuser information such as user names and passwords, URLsto sensitive information, and HTTP session cookies. Thus,in exchange for anonymity, users place their trust in compo-nents of the anonymous communication system that couldpotentially abuse it. In all cases, user data at some point isavailable in their original form.McCoy et al. [34] have shownthat there are Tor exit nodes which indeed eavesdrop on thetraffic flowing through them, abusing users’ trust.

An obvious solution to such problemsmight involve send-ing traffic encrypted using SSL through relays. However,malicious relay operators can employ man-in-the-middleattacks and snoop on the traffic of even SSL-encrypted ses-sions [51], and attacks of this kind have been observed in theTor network [4].

Our approach for the detection ofmisbehaving relay nodesinvolves the transmission of decoy traffic that contains easilyreusable and seemingly sensitive information (such as fakeplain-text user names and passwords) via all nodes of theanonymization network to decoy servers under our control.Relay nodes eavesdropping on user traffic may try to reusethis information and connect to our decoy servers. In thispaper, we present the overall architecture of our eavesdropdetection system, which can be used to detect eavesdroppingby untrusted nodes of various anonymization systems (andproxying systems in general). We have implemented our sys-tem for detecting eavesdropping bymalicious Tor exit nodes.Tor is among the most widely used relay-based anonymiza-tion networks, with over half a million worldwide users [53].Our system could, however, be easily adapted for other relay-based anonymization networks as well, such as JAP [31] andI2P [28].

The use of fake information, or honeytokens [46], fordetecting unauthorized access to sensitive data, has beenexplored previously for several applications related to net-work intrusion detection and misbehavior detection. How-ever, there has not been adequate research done in using suchinformation and systems to detect misbehavior by otherwisetrusted nodes of anonymization enabling networks and sys-tems. McCoy et al. [34] were the first to explore the useof transmitting decoy TCP traffic through Tor nodes to seewhich nodes eavesdrop on them. Their approach requiredaccess to DNS traffic and could be trivially defeated by

an adversary by simply using appropriate command lineoptions of tools such as tcpdump [33] which by default per-form reverse DNS look up for unresolved IP addresses. Ourapproach, in contrast, neither requires access to DNS traf-fic nor relies on default behavior of traffic capture tools. Wesend innocuous-appearing TCP traffic through Tor exit nodesexposing easily reusable decoy information and periodicallycheck the server logs for subsequent unsolicited connectionattempts. The decoy information being unique to each Torexit helps in easily identifying the Tor exit node involved inthe eavesdropping incident. Section 2 extensively describesall related research endeavors.

In previous work [15], we described howwe could use thesystem to detect eavesdrop using plain-text IMAP and SMTPprotocol messages, exposing fake usernames and passwordstoTor exit nodes.Our previous effort describes various eaves-dropping incidents detected between August 2010 and May2011. Thereafter, based on the activities of various adver-saries who logged into our system using the exposed usercredentials and on ideas and concepts borrowed from vari-ous related research efforts (such as using decoy documentsto detect insider attacks; proposed by Bowen et al. [10–12]),we have extended our systemwith the following components:

Honeypots Based on the detected activities of someeavesdropping incidents (described later in Sect. 3) wehave added SSH and FTP honeypots to the system.Beacon-bearing decoy documents We deployed a webserver which hosts decoy documents, generated usingD3 [10,13], presenting fake but alluring information suchas fake credit card numbers, and usernames and pass-words to fakepaypal.com accounts. These documentscontain beacons that are triggered when the documentsare opened, connecting to a remote site and reportinginformation such as time and IP address of the hostfrom which the document was accessed. The URLs tothese decoy documents are regularly exposed to Tor exitnodes through HTTP GET and POST messages with thehope that potential eavesdropperwould reuse the exposedURLs to access the decoy documents.

Apart from the above additions, we have also exploredhowour system could be used to detectmore advanced eaves-dropping incidents, such as HTTP cookie hijacking and SSLMITM attacks. We describe these in Sect. 5.

As a proof of concept, we have deployed various decoyservers and honeypots, and transmitted decoy traffic to thesesystems via all Tor exit nodes. Our system has been opera-tional for over 30months. During this period, it has detected18 incidents of eavesdropping by Tor exit nodes (some ofwhich have been previously described in our related previ-ous work [15]). In some cases, the adversary connected backto our decoy server from the same exit node itself while in

123

Page 3: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 207

others the adversary connected back from other exit nodes orhosts in other networks. Often, an eavesdropping exit nodestopped serving Tor traffic soon after the eavesdropping inci-dent was detected and in one case even reappeared after sev-eral months.

In summary, the main contributions of this paper are thefollowing:

– An architecture for detecting various forms of trafficsnooping by nodes of anonymous communication net-works (and proxy servers in general) that involves theexposure of reusable decoy information, such as plain-text user credentials andURLs to sensitive appearingdoc-uments containing beacons.

– A prototype of our proposed system using various decoyservers and honeypots that has been deployed in the Toranonymous communication network.

– A detailed forensic analysis of the eavesdropping inci-dents recorded by our system.

2 Background information

2.1 Anonymous network communication systems

Anonymous network communication systems enable usersto hide their identity from their communication peers. Mostof these systems rely on sending traffic via one or moreproxies and may additionally encrypt traffic, using conceptspresented by Chaum [16], to obfuscate the true source ordestination of messages. Such systems are often classifiedas low-latency and high-latency anonymous communicationsystems. Low-latency systems are designed to be efficientfor semi-interactive applications such as web browsing andinstant messaging. High-latency systems are geared towarddelay tolerant applications such as e-mail. Low-latency net-work anonymization systems are further classified based onthe routing paradigms they employ—those that are derivedfrom onion routing [21], and those that are based uponCrowds [44]. Systems such asTor [22], JAP [31], and I2P [28]employ deterministic routing, wherein the set of proxiesthrough which the traffic is sent is known by the connec-tion or session initiator. Systems such as GNUNet [8], Bit-Blender [7], and OneSwarm [30] employ probabilistic traf-fic routing schemes similar to Crowds. Each traffic forward-ing relay in such a system randomly chooses to send thetraffic either to the destination or to another relay in thesystem.

Tor Tor [22] is closely modeled on the onion routing [43]paradigm and is one of the most widely used low-latencyanonymity networks, with an estimated user base of morethan 500,000 users (as of March 2013 [52]). Tor aims to pro-

tect the anonymity of Internet users by relaying user TCPstreams through a network of overlay nodes run by volun-teers. Tor also supports responder anonymity through hiddenservices.1

The Tor overlay network consists of over 2,500 proxies,known as onion routers (ORs), which are mostly operated byvolunteers scattered across the globe. User traffic is relayedthrough circuits, which are formed by persistent TLS connec-tions between different nodes. By default, Tor circuits con-sist of three nodes: the first one is known as the entry node,the second one as the middleman, and the third is called theexit node. During circuit establishment, a Tor client negoti-ates shared secret keys with the relays that it chooses for thecircuit. Thereafter, the client uses these keys to encrypt thetransmitted messages in multiple layers of encryption, start-ing with the public key of the exit node. Each of the nodesthen first “peels off” one layer of encryption and then for-wards the message to the next node on the circuit. The exitnode decrypts the final layer of encryption, which reveals theoriginal plain-text message of the user, and forwards it to itsactual destination through a regular TCP connection. Thus, ifin-transit traffic is intercepted by eavesdroppers, they cannotdetermine the actual source or destination of the traffic.

Figure 1a presents the basic steps for the creation of a newTor circuit consisting of three onion routers:

1. The Tor client queries the directory service to obtain alist of the available Tor relays.

2. The client uses a set of relays to create Tor circuits. Bydefault, circuits are created using three relays.

3. The client selects one of the circuits and creates a TCPconnection to its entry node. Traffic is forwarded throughthe circuit to the exit node, which communicates directlywith the actual destination.

Tor also provides configurable access control features toexit node operators. Usually, Tor exit nodes are configured toallow traffic forwarding for only a small set of TCP services.The supported services are defined by the operator of the exitnode through the specification of an exit policy.

Crowds Reiter et al. designed Crowds [44] for anonymousweb browsing. Crowds is based on the principle of proba-bilistic forwarding. Like Tor (and other onion routing sys-tems), it relies on an overlay network for enabling responderanonymity. The nodes of the overlay network are called jon-dos. Aweb browser using this system forwards a web requestto a randomly chosen jondo. Upon receiving the request, the

1 A TCP-based service can keep its IP address hidden (and thus itsidentity) by replacing the IP address with a hidden service URL. TheseURLs end in a virtual top-level domain called “.onion” and are resolvedby a Tor clients while initiating connection to the hidden service.

123

Page 4: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

208 S. Chakravarty et al.

D

Tor Node

Unencrypted Link

TLS Encrypted Link

Directory Server D

Communication to directory services

1DirectoryServer

Tor client

Tor circuit establishmentvia TLS encrypted link

MiddlemanExit Node

2

Exit Node connectsto requested service

TCP/IP basedservice

3

Entry Node

D

Internet

(a) (b)

Fig. 1 Overview of different anonymity networks based on routingparadigms. a Basic steps for communicating through Tor. The clientobtains a list of the available Tor relays from a directory service 1©,

establishes a circuit using multiple Tor nodes 2©, and then starts for-warding its traffic through the newly created circuit 3©. b Crowds usersestablish paths via Jondos to web services

jondo forwards the request to another jondo with probabilityp f or sends it directly to the intended web service. Figure 1bschematically represents the functioning of the crowds.Manyof modern anonymity preserving P2P file-sharing systemssuch as GNUNet [9], BitBlender [7], and OneSwarm [30]are derived fromCrowds.We do not focus further on Crowdsparadigm we have implemented and demonstrated our archi-tecture for the Tor network.

2.2 Network and system misbehavior detection

Our work is closely related to research efforts that involve theexposure of enticing decoy information or resources to lurepotential adversaries, with the objective of identifying themand theirmodus operandi. One of the first uses of decoy infor-mation for enabling the observation of real malicious activityhas been documented by Clifford Stoll [48]. In his book, TheCuckoo’s Egg [49], the author recounts his efforts to trap anintruder that broke into the systems of the Lawrence Berke-ley National Laboratory. As part of his efforts to monitorthe actions and trace the intruder’s origin, he generated fakedocuments containing supposedly classified information thatwould lure the intruder to come back and stay longer on thecompromised computer.

Computer-based systems and resources deployed widelywith the objective of luring prospective adversaries andintruders for logging their identities and actions are widelyknown as honeypots [41,47]. Such systems have no produc-tion value other than being compromised and subsequentlyaid in tracking the actions of the attacker. Honeypots havebeen extensively used for modeling, logging, and analyzing

attacks originating from sources not only external to an net-work [27,56], but also from within its perimeter [13].

Complementary to honeypots, researchers and systemadministrators often use honeytokens [46] which are piecesof information with purpose no other than being interceptedor stolen and abused by an adversary. Any use of these hon-eytokens clearly indicates unauthorized access. The decoycredentials used in our approach, which we describe in detailin the next section, can be classified as a variety of honeyto-kens.

More recently, Bowen et al. [11] proposed the use of decoydocuments to detect misbehaving entities within the perime-ter of an organization. The decoy documents containedembedded “beacons,” such as scripts or mac-ros, which areexecuted when the document is opened. The authors usedfake tax records bearing information appearing to be “sen-sitive” and enticing to an adversary. In case a document isopened, the embedded beacon connects to an external hostand transmits information such as time of access and IPaddress of the hosts used to open the document.

In another related research work, Bowen et al. [12] usedreal WiFi traffic as a basis for the generation of decoy trafficwith realistic network interactions. An API is used to insertbait content, such as popular webmail service cookies, FTPandHTTPmessages, and so on into these decoy packets. Thepacketswere then broadcasted through an unencryptedWi-Finetwork and exposed to potential eavesdroppers. Unsolicitedconnection attempts to the services, using the bait credentials,are marked as illegitimate. In their experiments, the authorsreplayed gmail.com and PayPal.com messages carry-ing credentials and cookies for decoy accounts, and utilizedthe last login IP address feature of these services for deter-

123

Page 5: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 209

mining illegitimate connection attempts. Such techniques arenot applicable anymore as the aforementioned popular webmail and financial services encrypt their connections withSSL.

There has been little effort in detecting misbehaving over-lay nodes of anonymity networks. In a work most closelyrelated to ours, McCoy et al. [34] attempted to detect eaves-dropping onmalicious Tor exit routers by taking advantage ofthe IP address resolution functionality of network traffic cap-turing tools. Packet sniffing tools such as tcpdump [33] areby default configured to resolve the IP addresses of the cap-tured packets to their respective DNS names. Their systemtransmitted, via Tor exit nodes, TCP SYN packets destinedto unused IP addresses in a block owned by the system’soperator. When the packet capturing program attempted toresolve the IP address of a probe packet, it issued a DNSrequest to the authoritative DNS server. The system’s oper-ator had access to the traffic going to this authoritative DNSserver. Thus, requests to this DNS server with the unused IPaddress were an indication that probe packets had been inter-cepted by somepacket capturing programand could be tracedback to the network host where theywere captured. However,when capturing traffic on disk,tcpdump by default does notresolve any addresses, and in any case the eavesdropper cantrivially disable this functionality, rendering the above tech-nique ineffective.

In contrast to thatwork, our systemdoes not require accessto the DNS server traffic. We employ decoy clients andservers which communicate via Tor exits and present easilyreusable sensitive appearing information such as user creden-tials and URLs to sensitive appearing documents which alsocontain beacons such as those described above. The systemperiodically monitors the client and server logs for unso-licited connection attempts to the server which are markedas suspicious. Our system is flexible enough to be adaptedto detect eavesdropping in various different protocols. As aproof of concept, we have augmented the system we pre-sented in our previous paper [15] with an SSH honeypotand decoy FTP and HTTP servers presenting decoy docu-ments containing beacons. Additionally, we explored detect-ing HTTP cookie hijack from traffic going to social networksites and HTTPS man-in-the-middle attacks by maliciousexit node operators.

3 System architecture

In this section,we present the architecture of our traffic eaves-dropping detection system that we have deployed for Tor.Wedescribe the design of the decoy traffic transmission mech-anism and the corresponding decoy services, as well as theapproachweused for incident data collection and correlation.

3.1 Approach

Eavesdropping on network traffic is a passive operation with-out any directly observable effects. However, the fact thatsome traffic has been intercepted can be potentially inferred,when a third party that should not have access to the inter-cepted data uses it. For example, an eavesdropper can stealuser credentials for services that do not use application-layerencryption, such as user names and passwords for Web siteswith poor user authentication implementations, or for serversthat use clear-text sign-in protocols, such as FTP or IMAP.Thereafter, any attempt by the eavesdropper to access theuser’s account is an observable event. The problem with thelatter lies in (a service) identifying whether the use of a setof credentials was made by a third party or the user.

Our approach is based on the assumption that an eaves-dropper will use the intercepted data in somemanner.We usetwo types of decoy data that are not used in any other way, todeterminewith certainty that data was intercepted by a proxy.First, we use decoy authentication credentials to decoy ser-vices, essentially honeypots, under our control. The use ofthese credentials with our services at a later time is a clearindication that eavesdropping occurred. Second, we transmitURLs to decoy documents containing information of poten-tial value, such as decoy PayPal.com accounts, and fakefinancial transactions including fake credit card information.Later downloads of these documents also indicate eavesdrop-ping. Every decoy is uniquely transmitted through exactlyone proxy, so that we can later associate its use with it.

We further exploit the fact that the eavesdropper will prob-ably attempt to open these documents to collect further infor-mation.We use D-Cubed [10] to automatically generate thePDF and Microsoft Word documents and embed beacons,basically scripts, into them. These documents present infor-mation about fake Pay-Pal.com and credit card accounts,to lure adversaries. These scripts get launched automatically,when the documents are viewed by the eavesdropper, con-necting to a remote host under our control, and transmittinginformation such as the IP address of the host used to openthese documents. This aids us in gathering more informationabout adversaries, e.g., their geographic location based ontheir IP address.

Figure 2 illustrates the overall design of our system whenapplied on theTor network.A client under our control period-ically connects through Tor to our decoy server and transmitseasily reusable clear-text information, such as user authen-tication credentials. As a result, such easily re-usable andpotentially sensitive information is exposed to exit nodesof Tor circuits (and any other network entity between theexit node and the decoy server). Both the client and theserver record detailed information about any attempted con-nection (such as user credentials, URLs, and connectiontime stamps). These logs are thereafter periodically tal-

123

Page 6: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

210 S. Chakravarty et al.

Fig. 2 Overall architecture ofthe proposed traffic interceptiondetection system when appliedon the Tor network

Tor Client Decoy

Servers

Malicious Tor Exit Relay

BenignTor Exit Relays

Internet Bait traffic via Tor circuits

Correlate connection initiation times with receive times

SSH Honeypot

Decoy FTPServer

IMAP ServerHTTP Server

SMTP Server

Server LogsClient Logs

Various HTTPS SitesMITM Detection

HTTP CookieHijackDetection

Social NetworkingSites

lied to determine unsolicited connection attempts, which aremarked as suspicious.

In more detail, as the system is continuously running, thefollowing steps take place periodically:

1. The client connects to the decoy server through Tor andsends information such as unique user authentication cre-dentials (in case of IMAP and SMTP decoy servers)in clear-text, and URLs for sensitive appearing decoydocuments containing beacons (in case of HTTP decoyserver) through clear-textHTTPGETandPOSTprotocolmessages. The client creates circuits through all the exitnodes. Using unique user credentials and URLs per exitnode helps in identifying the actual exit node involvedwhen eavesdropping is detected.

2. The decoy server maintains detailed record for each ses-sion that may include the user name and password (forIMAP and SMTP), the IP address of the exit node usedin the connection, the URL pointing to the unique decoydocuments, and the time stamp corresponding to whenthe connection to the decoy server was established.

3. After a successfully completed session on the decoyserver, the system attempts to correlate it with a recentlycompleted client session. Connections observed on theserver for which there are no corresponding client con-nection attempts are labeled as suspicious.

Each of the unique user credentials and URLs to decoydocuments is associated with exactly one exit node and isexposed to it through a Tor circuit terminating at that node.Thus, the exit node involved in a particular eavesdroppingincident is known based on the given set of credentials or

URLs used in the unsolicited session observed by the decoyserver.

Our system has been optimized compared to our initialprototype [15] for gathering more information regarding theadversaries’ activities. Attempts to reuse some of the IMAPdecoy credentials with other services, like FTP and SSH,urged us to also set up honeypots with decoy FTP and SSHservices running. Moreover, we used the FTP server to alsoserve decoy documents. We describe all incidents in detailand present up-to-date information for new and previouslydetected eavesdroppers [14], in Sect. 4.

Note that our approach can be adapted to detect moreadvanced traffic interception attacks. In Sect. 5, we describean extension that can detect HTTP cookie hijacking attacksand man-in-the-middle attacks by malicious Tor exit nodes.

3.2 Implementation

Although Tor can forward the traffic of any TCP-based net-work service, in practice not all exit routers support all appli-cation protocols. For example, SMTP relay through port 25is blocked by the majority of Tor exit nodes to prevent spam-mers from covertly relaying their messages through the Tornetwork. Consequently, the first important decision we hadto take before beginning the implementation of our prototypesystemwas to choose a set of services that are supported by alarge number of Tor exit nodes. At the same time, candidateservices should support unencrypted authentication througha clear-text protocol, while the services themselves shouldentice potential eavesdroppers.

Tor exit nodes are usually configured to allow trafficforwarding or only a small set of TCP services. The ser-

123

Page 7: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 211

0

100

200

300

400

500

600

700

800

900

1000

21 23 25 80 110

143

587

3128

3306

5060

5190

5222

8080

Service port number

Exi

tro

ute

rs t

hat

allo

w t

raff

ic r

elay

ing

21 FTP23 Telnet25 SMTP-relay80 HTTP

110 POP3143 IMAP587 SMTP-delivery

3128 Squid proxy3306 MySql5060 SIP5190 ICQ/AOL-IM5222 XMPP8080 HTTP-alt

Fig. 3 Number of Tor exit nodes that allow traffic relaying through dif-ferent TCP port numbers, for services that support clear-text protocols

vices allowed are defined by the operator of the exit nodethrough the specification of an exit policy. To determine themost widely supported unencrypted application protocols,we queried the Tor directory servers and retrieved the numberof exit nodes that allowed each different protocol. Figure 3presents the number of Tor exit nodes that at the time of theexperiment allowed the relaying of traffic through variousTCP port numbers. In accordance with the results obtainedby McCoy et al. [34], widely used applications such as webbrowsing, email retrieval, and instant messaging are allowedby a large number of exit nodes.We found approximately 900exit nodes that allowed access to port 80. We also found 644exit nodes supporting exit to IMAP (port 143) and 455 exitnodes supporting SMTP delivery (port 587). Both these pro-tocols support plain-text user authentication. They involvetransmission of plain-text usernames and passwords.2

Credentials for accessing users’ messages that may con-tain sensitive private information, or for sending e-mailsthrough verified user addresses, can be of high value for amalicious eavesdropper. This led us to choose the IMAP andSMTP protocols for our prototype implementation. Further-more, due to the wide prevalence of exit nodes that allowexit for web traffic, we decided to use HTTP GET and POSTmessages to expose URLs of decoy documents containingbeacons, stored on a decoy web server that we control.

3.2.1 Decoy traffic transmission and eavesdroppingdetection

Our decoy traffic transmission subsystem is based on a cus-tom client that supports the IMAP and SMTP protocols. The

2 In contrast to SMTP relay (port 25), SMTP through port 587 is dedi-cated to message submission for delivery only for users that have reg-istered accounts on the server.

client has been implemented using Perl, and service proto-col emulation is provided by the Net::IMAPClient andNet::SMTP modules. We use curl [24] for transmittingthe HTTP GET and POST messages to expose the URL ofdecoydocuments to the exit nodes. The clients and servers arehosted on Intel x86 based machines running Ubuntu Linux.

Every day, for each service, our Tor client connects to thedecoy service several times via circuits through each of theexit nodes. This is achieved by establishing a new Tor circuitfor each connection, and enforcing each circuit to use a par-ticular exit node. Once a connection has been established, theclient authenticates on the server using a unique set of creden-tials associated with the particular combination of exit nodeand decoy server.3 Thereafter, the client performs activities,such as browsing through some folders in case of IMAP, orsending a fake e-mail message in case of SMTP, so as theprotocol message exchanges appear realistic. In case someexit node is not accessible, the corresponding set of creden-tials is skipped. Similarly, when a new exit node joins theoverlay network, a new set of credentials for each decoy ser-vice is generated for use only with that exit node. To achievethis, the Tor directory services are periodically queried andfresh list of exit nodes supporting exit to the requisite service(IMAP or SMTP delivery) is retrieved. Thereafter, new usercredentials are assigned to the set of fresh exit nodes.

Usernames are generated as a combination of namesin various languages [38] and random numbers usinglanguage confluxer [39] and prop [40]. Passwordsfor these usernames are generated using pwgen [54].

Similar to the decoy IMAP and SMTP client processes,a routine sends and retrieves decoy documents to and froma decoy web server, through circuits via each Tor exit. Theprocess exposes URLs of the decoy documents carrying thebeacons, to the exit node throughHTTPPOST andGETmes-sages. Each exit node is associated with a set of unique decoydocuments which are sent to and received from the server,through HTTP POST and GET messages, respectively. Weassume rogue exit nodes to be snooping on HTTP traffic andaccessing the exposed decoy document URLs.

Under normal operational conditions, the number of con-nections successfully initiated by the client each day, througheach exit node, should equal to the number of connectionsreceived by the server from each of these exit nodes. Foreach successfully initiated connection, the client-side scriptslog information such as exit node involved decoy creden-tials used and the connection start and end times. The cor-responding information for these connections, as seen at the

3 In other words, for each exit node that allows access to IMAP, we cre-ated a unique username and password. This unique association of theexit node and the exposed user credential helps identify the eavesdrop-ping exit nodes that snoop on these exposed credentials and connectback to our decoy server.

123

Page 8: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

212 S. Chakravarty et al.

server, is obtained from the server’s logs. Any unsolicitedsuccessful connection using some of the previously trans-mitted decoy credentials is labeled as an illegitimate suspi-cious connection attempt. Such suspicious connections areidentified by tallying the connections initiated by our clientto those received by the server, based on the logs recorded atthe client and the server. Specifically, upon the completionof a successful connection, the decoy server sends directly(not through Tor) to the client all the recorded informationabout the recently completed session. The client then com-pares the connection details, including the set of credentials,decoy documents, the exit node involved, and the start andend times of the connections recorded by both the client andthe server, against the recently completed connections. Foreach connection successfully initiated by the client, the sys-tem checks whether the corresponding connection can beidentified from the server logs. Since the system knows theexit node involved each connection, and the fact that it is usedwith exactly one set of decoy credentials, it is expected thatfor each connection initiated by the client, there is exactly oneconnection arriving at the server that matches the said con-nection attempt. In case no matching connection is found,the system generates a report that includes the time of thelast generated connection that used the intercepted creden-tials, the time of the unsolicited connection to the server, theIP address of its initiator, and the exit node involved in theincident.

3.2.2 Important implementation considerations

During the implementation of our prototype system, we dealtwith various issues related to improving the accuracy of ourtraffic interception detection approach, or with cases whereinteresting design trade-offs came up. We briefly describesome of these issues in the rest of this section.

Quality of decoy traffic and honeypot services The believ-ability of the decoy traffic [12] is a crucial aspect of the effec-tiveness of our approach. For instance, a decoy IMAP sessionusing an account that does not have a realistic folder structure,or that does not contain any real e-mail messages, might raisesuspicions to an eavesdropper. Repeating the same actions inevery session, or launching new sessions at exactly the sametime every day, can also be indications that the sessions areartificially generated. In our prototype system, we vary theconnection times and activity in each session, and we userealistically looking folder structures for the IMAP accountsand send innocuous-appearing e-mail messages. The inboxesof these decoy accounts contain messages attached withdecoy documents containing the beacons, generated fromthe D-Cubed system, presenting enticing information suchas decoy PayPal.com accounts and fake financial transac-tions involving fake credit card numbers. These documents

are attached to e-mail messages containing banking jargonto reduce suspicion.

As mentioned above, in some of the eavesdropping inci-dents, the adversaries actually tried to access other servicessuch as SSH and FTP using the exposed IMAP creden-tials. Thus, we installed a FTP server, hosting user accountscorresponding to each of the IMAP users. Each of theseaccounts used the same passwords, which were used theIMAP accounts. The users’ FTP directories were also popu-lated with decoy documents containing the beacons. Further,to make these accounts appear innocuous, we also placeddocuments taken from [45] and source code documentationsand help files taken from an open source program.

To track the behavior of adversaries who may try to login to SSH accounts using the exposed IMAP credentials, weinstalled kippo [18], an open-source, medium-interaction,and easy to configure SSH honeypot, on the virtual machinehosting our FTP server. Kippo supports multiple users andthus seemed well suited for our setup consisting of severalhundred fake user accounts and passwords. These fake SSHuser accounts of kippo used the same usernames creden-tials as those used for the IMAP and FTP accounts. TheIMAPserver redirects all FTPandSSHrequests to this virtualmachine hosting the SSHhoneypot and the decoy FTP server.

Time Synchronization Accurate time synchronizationbetween the client and the decoy server(s) ensures proper cor-relation of the connections generated by the client with theconnections received by the server, and the correct identifica-tion of any unsolicited connections. Although the volume ofour decoy connections is very low, allowing any illegitimateconnections to easily stand out, the clocks of all hosts in ourarchitecture are kept synchronized using the Network TimeProtocol. The sub-second accuracy ofNTP allows the precisecorrelation of the connection start and end times observed onboth the client and server. This offers an additional safeguardfor the verification of the detected traffic interception inci-dents.

Eavesdropping Incident Verification Besides the accuratecorrelation between the start and end times, logged by theclient and the server, we have taken extra precautions to avoidany inaccurate classification of our generated decoy connec-tions as illegitimate. For each connection launched by theclient, the system also keeps track of the circuit establishmenttimes by monitoring Tor client’s control port. Moreover, wehave enabled all the built-in logging mechanisms providedby the Tor software. On the server side, all the incomingand outgoing network traffic is captured using tcpdump. Inaddition to the server logs, the captured traffic provides valu-able forensic information regarding the nature of illegitimateconnections, such as the exact sequence of protocolmessagessent by the attacker’s IMAP, SMTP, and HTTP clients.

123

Page 9: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 213

4 Deployment results

Our prototype implementation has been continuously oper-ational in the Tor network since August 2010. During thecourse of over 30months of its operation, our system hasdetected sixteen traffic interception incidents. In this sec-tion, we describe the eavesdropping and subsequent mali-cious connection attempts using the snooped user creden-tials. We analyze the consequent activities of the intrudersas they were recorded in the decoy server logs. Our incidentdescriptionWeb site [14] contains information about the exitnodes involved in each incident and details of the activitiesof the intruders once they logged in our decoy server usingthe snooped user credentials.

4.1 Eavesdropping incidents

The observed eavesdropping incidents were related to dif-ferent exit nodes, and all the related illegitimate connectionswere received by our decoy IMAP server. Based on the inter-cepted credentials used in each unsolicited connection, wewere able to identify the Tor exit node involved in each inci-dent. Information about the detected incidents (e.g., date, exitnode location, and activities recorded by the server) is pre-sented in Table 1. The detail of the remaining incidents areavailable in our Web site [14].

While most of the incidents involved a different exit node,there were some, such as one in India and another one inSouth Korea, which eavesdropped on exposed credentialsrepeatedly and connected back to our decoy server. The onesin India, hosted in the same ISP network, repeatedly con-nected back to our decoy server for weeks. Even after wemodified the passwords associated with the IMAP accounts,the exit operators learnt the modified passwords by snoop-ing on the traffic and connected back to the decoy serverwith these new passwords. The node in South Korea eaves-dropped on our decoy traffic three times. Although after eachconnect-back attempt the exit node was inaccessible, it sur-faced after sometime and again attempted to eavesdrop onthe decoy traffic.

There were also several incidents in which the maliciousexit nodes were not accessible for days after the eavesdrop-ping incidents and subsequent connect-back attempts. Also,as evident from Table 1, in the majority of the incidents, theadversaries connect to the decoy server via other exit nodesor hosts, probably as an attempt to hide their true identities.However, in the first four incidents, which occurred together,the connect-back attempts originated directly from the exitnodes at which the decoy user credentials were exposed. Allthese connect-back attempts occurred within four to 6h afterthe exposure of the decoy credentials, an interval signifi-cantly shorter compared to the rest of the incidents. We thusspeculate that these first four eavesdropping cases were coor-

dinated by the same individual or group, probably using thesame tools or methodology in each case.

Ten eavesdropping incidents were detected betweenAugust 2010 and April 2011. These have been described indetail in our previous paper [15]. Eight new incidents weredetected between November 2011 and March 2012. By andlarge, in each of these incidents, the modus operandi of theadversaries was similar to that of the previous incidents.

The first of these incidents that occurred in November2011 involved an exit node in the UK. The exit node operatorconnected back to our server via a host that was runningin a cloud service provider’s network. Thereafter, in severalinstances, the involved exit node was inaccessible soon afterthe eavesdropping incident.

The second incident (the 12th one in Table 1) involved anexit node in Russia. In this, the adversary tried to connect toanSSHserver using the exposed credentials,which promptedus to install an SSH honeypot. All SSH connections arrivingto the IMAP server, involving the decoy credentials, weredirected to the SSH honeypot.

The next incident involved an exit node in South Koreawhich had eavesdropped earlier in September 2010 and con-nected to the decoy server via another exit node. Thereafter,it was inaccessible for a considerable period of time afterwhich it resurfaced and eavesdropped again.

The fourth incident involved an exit node in Germany.The adversary connected back to the decoy server severaltimes via an exit node in the Netherlands. In each of theseconnect-back attempts, the adversary connected to our decoyserver for a few minutes and logged out. After analyzingthe exchanged protocol messages, it seems that the adver-sary connected to the server without using any known mailclient program (perhaps using their own custom mail clientthat simply connects and disconnects, thereby checking thevalidity of the credentials).4

The fifth one involved an exit node in theNetherlands. Theexit node operator connected back to our decoy server via anexit node in Switzerland. The malicious exit node operatorconnected to our decoy server only once for a short durationand then logged out.

The sixth incident involved an exit node in the USA. Theexit node operator eavesdropped and subsequently connectedback to the decoy server via an exit node in Canada. Theadversary connected several times to the decoy server, usingan email client program (possibly Kmail).

4 Mail clients generally execute a set of commands on the server to fetchthe various user directories associated with an account. The absence ofsuch commands and zero payload length could be a strong indicationthat the adversary does not use any known mail client. We have studiedthe various protocol messages exchanged by various popular mail clientprograms.

123

Page 10: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

214 S. Chakravarty et al.

Table 1 Observed trafficinterception incidents duringfirst 21months of thedeployment

In all cases, the eavesdropperconnected to our decoy IMAPserver using a set of intercepteddecoy credentials

Incident number Date Exit node location Remarks

1 Aug.’10 US Same pattern as in incidents 2, 3, and 4

Connect-back from the same exit node

2 Aug.’10 Hong Kong Same pattern as in incidents 1, 3, and 4

Connect-back from the same exit node

3 Aug.’10 UK Same pattern as in incidents 1, 2, and 4

Connect-back from the same exit node

4 Aug.’10 The Netherlands Same pattern as in incidents 1, 2, and 3

Connect-back from the same exit node

5 Sep.’10 S. Korea Connect-back from a different exit node

6 Sep.’10 Hong Kong Connect-back from a third party host

Exit node not accessible upon detection

7 Sep.’10 India Connect-back from third-party hosts

Exit node not accessible upon detection

8 Jan.’11 Germany Connect-back from third-party hosts

Attempt to use SSL through the IMAP

STARTTLS command

9 Apr.’11 India Connect-back from third-party hosts and other Torrelays

10 Apr.’11 India Same as 9. Both exit nodes in the same

ISP network and many of the third-partyconnect-back hosts were in the same networks(mostly in Europe and India)

Was involved in incident 7

11 Nov. ’11 UK Connect back via a host in a web-hosting and cloudservice providing organization

12 Nov. ’11 Russia Connect back via another host in a Russian ISP

13 Nov. ’11 S. Korea Exit node involved in incident 5

14 Jan. ’12 Germany Connect back via another Tor exit in The Netherlands

15 Jan. ’12 The Netherlands Connect back via another Tor exit in Switzerland

16 Jan. ’12 US Connect back via another host in a Canadian ISP

17 Jan. ’12 S. Korea Exit node involved in incidents 5 and 13

18 Mar. ’12 Estonia Connect back via another host in a Polish ISP

The next one involved the exit node in South Korea, theonewhose operator had eavesdropped twice previously. Eachtime the exit node operator connected back to our server, theexit node became inaccessible for several weeks and resur-faced, only to be detected again by our system. After this sev-enth incident, the exit node was finally blacklisted by Tor’soperators.

Finally, the eighth incident (the 18th one listed in Table 1)involved an exit node in Estonia, whose operator eaves-dropped and connected back to our decoy server via an exitnode in Poland. There was only one connect-back attempt bythe adversary that lasted for about 10min.

Figure 4 presents this time differences between the expo-sure of the decoy credentials and the subsequent connect-back attempts for each of the incidents. The horizontal axisrepresents the eavesdropping and connect-back events. Thevertical axis denotes the time delay between the exposure

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Incidents

Tim

e D

elay

(h

ou

rs)

Fig. 4 Time difference between the exposure of the decoy credentialsand the first connect-back attempt on the decoy server

123

Page 11: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 215

Fig. 5 Locations of the Tor exit nodes involved in the observed traffic interception incidents, and the non-Tor hosts that connected back to thedecoy servers. Numbers refer to the corresponding incidents listed in Table 1

of the decoy credential and its subsequent usage in connect-back attempts.

The map in Fig. 5 presents an overall view of the geo-graphic locations of the exit nodes and the third-party hostsinvolved in the observed incidents. Tor and non-Tor nodesare represented using different symbols. We used basic geo-IP address lookup tools which provide only country-levelaccuracy, so the points on the map denote only the countryin which each host was located. The number next to eachpoint corresponds to the incident number, as presented inTable 1.

Apart fromdetecting eavesdropping on plain-text protocolmessages, our system could be used to detect more advancedattacks. In the next section, we demonstrate how our systemcould be used to detect SSL man-in-the-middle attacks. Wedetected an exit node in Russia that was involved in suchattacks.

4.2 Adversaries’ activities

In someof the incidents, the adversary connected to the decoyserver using popular e-mail clients, while in the rest, theyconnected directly to the server and manually issued proto-col command messages. Popular e-mail clients issue a cer-tain default set of commands to access the mail folders suchas INBOX, Drafts, Sent, and so on. The IMAP proto-col allows users to issue various commands that can resultin fetching the contents of these folders. Each mail clientsissues a somewhat different set of commands to fetch thecontents of these folders. The commands, and their orderin which they are issued, can be treated as a “signature”for the client. We analyzed the signatures of various pop-ular e-mail clients (e.g., Thunderbird, MS Outlook,

Balsa [5], Claw [17], Sylpheed [50], Kmail [32]and Evolution [25]) to determine the possible e-mailclients the adversaries used. From the network traffic, cor-responding to the time when the adversaries connected tothe decoy server using IMAP clients, it appears that Kmail,Evolution, andThunderbirdwere commonly used forconnecting to our decoy server.

In the incidents where the adversary connected manu-ally to the IMAP server, we observed the adversary exe-cuting various different kinds of commands. In one, adver-saries connected and switched to TLS mode so as to hidetheir activities. We thereafter turned off the capability inthe server to switch to TLS mode after establishing con-nections. In another incident, the adversary issued esotericIMAP4 ACL [35] commands. In yet others, the adversarytried to use credentials to log into services such as FTPand SSH. These services were, however, inaccessible usingthe decoy user credentials. These activities compelled usto install the decoy FTP server and SSH honeypot. TheIMAP server redirects the connection attempt to FTP andSSH services to the decoy FTP server and SSH honeypotso as to lure the attackers to the download decoy docu-ments or try to execute programs from the terminal interface,thereby aiding in gathering further information about suchattackers.

4.3 Volume of decoy traffic injected

For each connection to the IMAP decoy server and subse-quent protocol message exchanges, we sent approximately15.4KB of traffic. For 644 decoy accounts, each correspond-ing to a unique exit node, this number totals to approximately10MB. In case of such connections and exchanges to the

123

Page 12: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

216 S. Chakravarty et al.

Table 2 Available bandwidth ofthe malicious exit nodes asreported by http://torstatus.blutmagie.de/

Incident number Advertised Bandwidth Remarks

1 Unknown Relay was not running when accessed

2 Unknown Relay was not running when accessed

3 44Mbit/s Guard node with high uptime

4 20.8Mbit/s Guard node with high uptime

5 1.4Mbit/s Advertises high uptime

6 56Kbit/s Advertises high uptime, runs directory service

7 Unknown Relay was not running when accessed

8 856Kbit/s Guard node with high uptime, runs directory service

9 150Kbit/s Non-guard exit node

10 100Kbit/s Non-guard exit node

11 Unknown Relay was not running when accessed

12 1Mbit/s Guard node with relatively high uptime

13 320Kbit/s Non-guard exit node

14 8.5Mbit/s Non-guard exit node with high bandwidth and uptime

15 Unknown Relay was not running when accessed

16 336Kbit/s Non-guard exit node

17 320Kbit/s Non-guard exit node

18 1.6Mbit/s Guard node with relatively high uptime

SMTP decoy server, we sent about 21.3KB of traffic. For 455decoy SMTP accounts, this figure comes out to be approxi-mately 9.7MB.5

4.4 Attributes of the exit nodes involved in the incidents

Table 2 shows the available bandwidth of the exit nodesthat were involved in the detected incidents. Two of theexit nodes advertised very high available bandwidth (44and 20.8Mbit/s, respectively) and thus are very likely tobe selected in Tor client circuits, as the default Tor circuitnode selection mechanism is biased toward nodes with highadvertised available bandwidth [20]. There were somewhichadvertised somewhat lesser bandwidth of 8.5 and 1.4Mbit/s.Finally, there were some which advertised yet lower band-widths of less than 1Mbit/s. Both of the high bandwidth, andtwo of the lower bandwidth nodes, were guard nodes6 withhigh up-times.

5 This difference is primarily due to the different lengths of IMAP andSMTPmessages. The overhead due to Tor protocol messages, involvingcircuit setup, key exchanges, accounting, and circuit termination, doesnot vary significantly between IMAP and SMTP.6 By default, a fixed set of entry nodes used by Tor clients to defendagainst traffic analysis attacks that can be launched by malicious entryand exit nodes.

5 Other efforts and possibilities: HTTP session cookiehijack and SSL MITM detection

Apart from detecting eavesdropping on plain-text user cre-dentials and HTTP URLs, our system is capable of beingused for various other complex forms of traffic eavesdrop-ping and misuse detection. As a proof of concept, we triedto detect HTTP cookie hijack attacks and SSL man-in-the-middle attacks by malicious exit nodes. We elaborate moreon these efforts in this section.

5.1 Detection of HTTP session hijacking

Besides snooping on users’ traffic, an adversary that hasaccess to unencrypted network data can also mount HTTPsession hijacking attacks against users that connect to socialnetworking sites like facebook.com. Previously, suchsites had no option to encrypt user traffic exceptwhile authen-ticating them. Now, even when using HTTPS, there are var-ious facebook.com applications that switch to HTTPand never switch back to HTTPS again, thereby exposingHTTP session cookies to eavesdroppers. In a session hijack-ing attack, the attacker can steal the session cookie that isincluded in the HTTP requests of authenticated users anduse it to access the user’s account. The fact that social net-working sites are among the most frequently accessed Websites Tor [36], combined with the ease of hijacking user ses-sions using tools like Firesheep [26], makes the possibility of

123

Page 13: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 217

mounting session hijacking attacks on Tor exit nodes quiteattractive for adversaries.

For detecting session cookie hijack attacks, we create sev-eral fake facebook.com user profiles. In this scheme, thedecoy traffic consisted of activity generated by connectingto these facebook.com profiles and performing cannedactivities such as checking messages and status updates.However, we could not create unique facebook.com pro-files for each of the approximately 900 Tor exit nodes thatsupported exit for web traffic. We thus created about twentyaccounts and repeatedly logged into facebook.com byreusing the accounts. We exposed the first account in ourlist to the first exit node in our list, second account to thesecond one in our list, and so on till the twentieth account.After logging in, our system checked the various user profilepages and private message folders for new messages. Thisprocess exposes the session cookies to the exit node severaltime for each of the accounts. Thereafter, our system waitedfor a few minutes and checked the first twenty accounts forchanges in profiles such as status update messages or privatemessages to others users in the contact list of the hijackedprofile. Then, we again exposed the first user account in ourlist to the twenty-first exit node, the second one to the twenty-second and so on. We used a firefox browser automationframework, called iMacros [29], to perform these periodiclogins and canned interactions. We ran our system for about6months but did not find any eavesdropping exit nodes sniff-ing on Facebook cookies.

5.2 SSL man-in-the-middle attack detection

Man-in-the-middle attacks have been observed by somemalicious exit nodes [4] that try to intercept and compro-mise SSL key establishment process and use unverifiable orself-signed certificates. Our system can easily be adapted todetect such man-in-the-middle attacks. To do so, we need totransmit SSL connections via Tor exit node to SSL serviceswhose certificates can otherwise be verified. If in some caseswe are unable to verify the server certificate when accessingthe server via an exit node, we would conclude that the exitnode has possibly manipulated the server to client traffic andmight be intercepting the SSL connection establishment. Inour setup, we used curl to connect to popular HTTPS sitessuch as popular webmail services and banks and checkedwhether server certificate could be verified. One could useother tools such as Tor SSL MITM Checker [37] for check-ing the certificates. This process is repeated for all exit nodes.We found one exit node through which when SSL traffic wasexposed, the server certificate verification failed. This nodewas, however, already blacklisted in the Tor directory ser-vices and would not normally be selected when using thedefault Tor client configuration. In fact, more recently, Win-ter and Lindskog [55] have demonstrated that there are sev-

eral exit nodes in Russia which are involved in various kindsof SSL man-in-the-middle attacks. Based on their observa-tions, they claim that these might be operated by the sameindividual (or group).

6 Discussion and future work

6.1 Detection confidence

Internet traffic crosses multiple network elements until itreaches its final destination. The encrypted communicationused in anonymity networks protects the original user traf-fic from eavesdropping by intermediate network elements,such as routers or wireless access points, until it reaches theboundary of the overlay network. However, the possibility oftraffic interception is not eliminated, but is rather shifted tothe network path between the exit node and the actual des-tination. Consequently, the transmitted decoy credentials inour proposed approach might not necessarily be snooped onthe exit node of the overlay, but on any other network ele-ment toward the destination. This means that in the incidentsdetected byour system, the decoy credentials could have beenintercepted at some other point in the network path betweenthe exit node and the decoy server, and not at the exit nodeitself.

Although the above possibility can never be ruled out com-pletely, we strongly believe that in all incidents the decoy cre-dentials were indeed intercepted at the involved exit node forthe following reasons. The ease of installing and operating aTor exit node means that not only adversaries can easily setup and operate rogue exit nodes, but also that exit nodes oper-ated by honest individuals may be running on systems thatlack the latest software patches, or have poor security config-urations. This may enable adversaries to easily compromisethem and misuse the hosted Tor exit node. At the same time,most of the network elements beyond a Tor exit node areunder the control of ISPs or other organizations that haveno incentive to blatantly misuse intercepted user credentialsby directly attempting to access the user’s accounts. Further-more, in some of the cases, the adversary connected back tothe decoy server from the same exit node involved in the par-ticular eavesdropping incident, raising even more suspicionthat the exit node was rogue or has been compromised. Inseveral cases, the involved exit node was inaccessible imme-diately following the connect-back attempt. In many cases,the eavesdropping adversaries connected back to the decoyserver via other exit nodes, so as to hide their true sources.Such typical actions of almost all adversaries shift the suspi-cion toward Tor relay operators.

Increasing Detection Confidence Using Multiple DecoyServers. As part of our future work, we plan to use multi-

123

Page 14: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

218 S. Chakravarty et al.

ple decoy servers scattered in different networks. Thereafter,we could check for eavesdropping and subsequent replay ofuser traffic, on each of the decoy servers. If eavesdroppingis attempted on a traffic going to only a subset of the decoyservers, then it might be due to a malicious network routerintercepting the path connecting the exit node to the saiddecoy servers. If, however, eavesdropping is attempted forthe traffic going to all the decoy servers via an exit node,it might have been perpetrated by the said exit node. Fur-thermore, one may use different sets of user accounts anddecoy documents for the different decoy servers. Each of theexit nodes would thus be exposed to multiple sets of decoyuser credentials, each one associated with a different decoyserver. If, for a given exit node, eavesdropping is detectedfor one set of decoy user credentials or decoy documents,and not for others, then it might have been on network ele-ments between the exit node and the corresponding decoyserver, corresponding to the said set of decoy credentials ordocuments. However, if eavesdropping is detected for all thedecoy user credentials or documents exposed to it, it mightlikely be involving the exit node, because the network routersin the paths from the exit node to the individual decoy serversare exposed to different decoy user credentials. It seems lesslikely that a network router on a certain path would knowthe decoy user credentials that are exposed to routers onother network paths. That said, these measures do not com-pletely rule out the scenarios wherein the adversary happensto eavesdrop on the traffic when the client access only oneof the decoy servers and not the other(s) or when the net-work paths intervening the exit nodes and the decoy serversintersect (and share common network elements).

6.2 Traffic eavesdropping and anonymity degradation

Traffic eavesdropping on anonymous communication sys-tems might not lead to direct degradation of networkanonymity. However, inadvertently leaking user informationsuch as login credentials can reveal vital information aboutthe users, such as identity, location, service usage, social con-tacts, and so on. Specifically for Tor, the anonymity set com-monly refers to all possible circuits that can be created, orthe set of all possible active users of the system [19].

Traffic eavesdropping might help reveal information likethe language and content of the messages, the particulardialect of the users, or other peculiarities that might helpreducing the size of the anonymity set. For instance, a mali-cious exit node operator might see traffic carrying user datain Greek. Combined with the knowledge that there are aboutseven ISP networks in Greece, this information might helpreducing the anonymity set significantly. Other clues such asthe actual accessed content, the time of access, and the desti-nation of the traffic can as well aid the process of determininga user’s identity.

6.3 Lessons learnt

Until recently, there were no adequate research efforts toidentify malicious eavesdropping nodes of anonymous com-munication networks. Popular anonymization networks likeTor are prone to Sybil attacks [23], where an adversarycould run a few malicious nodes and attract a large frac-tion of the traffic [6] to aid traffic analysis attacks [3,42].In such attacks, an adversary, capable of observing net-work traffic statistics in several different networks, corre-lates the traffic patterns in them and associates otherwiseseemingly unrelated network connections. The process canlead an adversary to the source of an anonymous connection.Such attackers, running malicious nodes (e.g., malicious Torexit nodes), could eavesdrop on users’ traffic to gather pri-vate information. In fact, powerful government organizationscould be operating nodes with high bandwidth, in severalnetworks, to collect sensitive data and aid traffic analysisattacks.

We have explored ways of identifying such malicious exitnodes by deliberately exposing them with “sensitive appear-ing” but fake traffic (such as fake usernames and passwordsfor e-mail accounts), destined to decoy servers under ourcontrol. Such eavesdroppers, having access to traffic enter-ing and leaving exit nodes, could also be potential trafficanalysis attackers. Therefore, by determining eavesdroppers,one could be potentially identifying attackers who may alsolaunch other types of attacks.

Our prototype system uses decoy traffic for some commonTCP/IP services that support plain-text user authenticationmessages. We have successfully demonstrated that the strat-egy could be used to identify malicious Tor exit nodes. Basedon the modus operandi of the eavesdropping adversaries,and their activities recorded in the logs of the decoy servers,we augmented our system to gather more information aboutthe adversaries and explored possibilities of using our strat-egy with more advanced kinds of eavesdropping activities,such as HTTP cookie hijacking attacks and SSL man-in-the-middle attacks.

7 Conclusion

Users of various anonymous communication networks andproxying architectures often misconstrue the anonymityguarantees offered by such systems with end-to-end confi-dentiality. The use of encryption in anonymity networks, likeTor, protects the confidentiality of the user traffic as it is beingrelayed within the overlay network. This protects the origi-nal user traffic against surveillance by local adversaries, asfor example in the case where the user is connected throughan unsecured public wireless network. Even when encryptedusingSSL, users are not safe fromman-in-the-middle attacks.

123

Page 15: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

Detection and analysis of eavesdropping 219

In this paper, we have focused on the problem of detectingmalicious eavesdropping nodes of proxying architectures,especially anonymity networks. To tackle this problem, wehave presented an approach involving the use of decoy net-work traffic injection to detect rogue nodes of anonymitynetworks engaged in traffic eavesdropping. Our approach isbased on the injection of bait credentials and decoy docu-ments, through Tor, to decoy services such as IMAP, SMTP,and HTTP, with the aim to entice prospective snoopers tointercept and actually use the bait credentials and docu-ments. The system can detect if a set of credentials has beenintercepted, by monitoring for unsolicited connections to thedecoy servers and by the alerts generated by the decoy doc-uments containing the beacons, exposed to the exit nodes.We additionally run Honeypots to gather more informationof the attacker. Moreover, our system can be easily adaptedto detect more advanced traffic interception attacks such asHTTP cookie hijack and SSL man-in-the-middle attack.

Our prototype has been operational for over 30months.During this period, the system detected eighteen incidents oftraffic interception, involving exit nodes across the world. Inall cases, the adversary attempted to take advantage of inter-cepted bait IMAP credentials by logging in on the decoyserver, in some cases from the same exit node involved inthe eavesdropping incident. Our system continues to runand detect eavesdropping by malicious exit node operators.Details of the latest incidents can be obtained from our Website [14].

Acknowledgments This work was supported by DARPA and ONRthrough Contracts DARPA-W011NF-11-1-0140 and ONR-MURI-N00014-07-1-090, respectively. Any opinions, findings, conclusions,or recommendations expressed herein are those of the authors and donot necessarily reflect those of the US Government, DARPA, or ONR.

References

1. Anonymizer, Inc. http://www.anonymizer.com/2. Anonymouse. http://anonymouse.org/3. Back, A., Möller, U., Stiglic, A.: Traffic analysis attacks and trade-

offs in anonymity providing systems. In: Proceedings of the 4thInternational Workshop on Information Hiding(IHW), pp. 245–257. Springer, London (2001)

4. Known bad relays. https://trac.torproject.org/projects/tor/wiki/doc/badRelays

5. Balsa—An e-mail client for GNOME. http://balsa.gnome.org/6. Bauer, K., McCoy, D., Grunwald, D., Kohno, T., Sicker, D.: Low-

resource routing attacks against tor. In: Proceedings of the 2007ACMWorkshop on Privacy in Electronic Society (WPES), pp. 11–20 (2007)

7. Bauer, K., McCoy, D., Grunwald, D., Sicker, D.: Bitblender: light-weight anonymity for bittorrent. In: Proceedings of the work-shop on Applications of private and anonymous communications,AIPACa ’08, pp. 1:1–1:8.ACM,NewYork,NY,USA(2008)doi:10.1145/1461464.1461465

8. Bennett, K., Grothoff, C.: Gnunet: gnu’s decentralized anonymousand censorship-resistant P2P framework. http://gnunet.org/

9. Bennett, K., Grothoff, C.: GAP—practical anonymous networking.In: Proceedings of the Privacy Enhancing Technologies Workshop(PET), pp. 141–160 (2003)

10. Bowen, B.M., Hershkop, S., Keromytis, A.D., Stolfo, S.J.: D-cubed. http://sneakers.cs.columbia.edu/ids/RUU/Dcubed/

11. Bowen, B.M., Hershkop, S., Keromytis, A.D., Stolfo, S.J.: Baitinginside attackers using decoy documents. In: Proceedings of the 5thInternational ICST Conference on Security and Privacy in Com-munication Networks (SecureComm), pp. 51–70 (2009)

12. Bowen, B.M., Kemerlis, V.P., Prabhu, P., Keromytis, A.D., Stolfo,S.J.: Automating the injection of believable decoys to detect snoop-ing. In: Proceedings of the Third ACM Conference on WirelessNetwork Security (WiSec), pp. 81–86 (2010)

13. Bowen, B.M., Salem,M.B., Hershkop, S., Keromytis, A.D., Stolfo,S.J.: Designing host and network sensors to mitigate the insiderthreat. IEEE Secur. Priv. 7, 22–29 (2009). doi:10.1109/MSP.2009.109

14. Chakravarty, S., Polychronakis, M., Portokalidis, G., Keromytis,A.D.: Details of various eavesdropping incidents. http://dph72nibstejmee4.onion/decoys_via_tor/map.html

15. Charavarty, S., Portokalidis, G., Polychronakis, M., Keromytis,A.D.: Detecting traffic snooping in tor using decoys. In: Proceed-ings of the 14th International Symposium on Recent Advances inIntrusion Detection, pp. 222–241 (2011)

16. Chaum, D.L.: Untraceable electronic mail, return addresses, anddigital pseudonyms. Commun. ACM 24(2), 84–90 (1981)

17. Claws mail. http://www.claws-mail.org18. Desaster: kippo ssh honeypot. http://code.google.com/p/kippo19. Díaz, C., Seys, S., Claessens, J., Preneel, B.: Towards measuring

anonymity. In: Proceedings of the 2nd International Conferenceon Privacy Enhancing Technologies. PET’02, pp. 54–68. Springer,Berlin (2003)

20. Dingledine, R., Mathewson, N.: Tor path specification. https://gitweb.torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=path-spec.txt

21. Dingledine, R., Mathewson, N., Syverson, P.: Onion Routing.http://www.onion-router.net/

22. Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: Proceedings of the 13thUSENIXSecu-rity Symposium, pp. 303–319 (2004)

23. Douceur, J.R.: The sybil attack. In: Proceedings of InternationalWorkshop on Peer-to-Peer Systems (2001)

24. Stenberg, D.: kippo curl. http://curl.haxx.se25. Evolution. http://projects.gnome.org/evolution26. Firesheep. http://codebutler.com/firesheep27. The Honeynet Project. http://www.honeynet.org/28. I2P Anonymous Network. http://www.i2p2.de/29. iOpusTM: iMacros©. http://www.iopus.com/imacros/30. Isdal, T., Piatek, M., Krishnamurthy, A., Anderson, T.: Privacy-

preserving P2P data sharing with oneswarm. In: Proceedings ofthe Conference on Applications, Technologies, Architectures, andProtocols for Computer Communications (SIGCOMM), pp. 111–122 (2010)

31. JAP. http://anon.inf.tu-dresden.de/32. Kmail—mail client. http://kde.org/applications/internet/kmail33. McCanne, S., Leres, C., Jacobson, V.: Tcpdump and libpcap. http://

www.tcpdump.org/34. Mccoy, D., Bauer, K., Grunwald, D., Kohno, T., Sicker, D.: Shining

light in dark places: understanding the tor network. In: Proceedingsof the 8th International Symposium on Privacy Enhancing Tech-nologies (PETS), pp. 63–76 (2008)

35. Meyers, J.: IMAP4 ACL extension. http://www.ietf.org/rfc/rfc2086.txt

36. Mulazzani, M., Huber, M., Weippl, E.R.: Tor HTTP usage andinformation leakage. In: Proceedings of the IFIP Conference on

123

Page 16: Detection and analysis of eavesdropping in anonymous ...angelos/Papers/2015/ijis-anonymity.pdf · Anonymous communication systems [1 ,2 22 31 44]are popular examples of proxy-based

220 S. Chakravarty et al.

Communications and Multimedia Security (CMS), pp. 245–255(2010)

37. Palfrader, P.: Tor SSL MITM check. http://svn.noreply.org/svn/weaselutils/trunk/tor-exit-ssl-check

38. Pound, C.: Chris Pound’s language machines. http://www.ruf.rice.edu/~pound/

39. Pound, C.: Language confluxer. http://www.ruf.rice.edu/~pound/new-lc/

40. Pound, C.: Prop. http://www.ruf.rice.edu/~pound/prop41. Provos, N.: A virtual honeypot framework. In: Proceedings of the

13th USENIX Security Symposium, pp. 1–14 (2004)42. Raymond, J.F.: Traffic analysis: protocols, attacks, design issues,

and open problems. In: Proceedings of Designing Privacy Enhanc-ing Technologies: Workshop on Design Issues in Anonymity andUnobservability, pp. 10–29. Springer, LNCS 2009 (2000)

43. Reed, M.G., Syverson, P.F., Goldschlag, D.M.: Anonymous con-nections and onion routing. IEEE J. Sel. Areas Commun. 16, 482–494 (1998)

44. Reiter, M.K., Rubin, A.D.: Crowds: anonymity for web transac-tions. ACM Trans. Inf. Syst. Secur. 1, 66–92 (1998)

45. Services, O.U.C.: The university of oxford text archive. http://ota.ahds.ac.uk/

46. Spitzner, L.: Honeytokens: the other honeypot. http://www.symantec.com/connect/articles/honeytokens-other-honeypot

47. Spitzner, L.:Honeypots: catching the insider threat. In: Proceedingsof the 19th Annual Computer Security Applications Conference(ACSAC) (2003)

48. Stoll, C.: Stalking the wily hacker. Commun. ACM 31(5), 484–497(1988)

49. Stoll, C.: The cuckoo’s egg: tracking a spy through the maze ofcomputer espionage. Doubleday, New York (1989)

50. Sylpheed-lightweight and user-friendly e-mail client. http://sylpheed.sraoss.jp/en

51. Furry, T.: TOR exit-node doing MITM attacks. http://www.teamfurry.com/wordpress/2007/11/20/tor-exit-node-doing-mitm-attacks/

52. Tor metrics portal. http://metrics.torproject.org/53. Tor metrics portal: number of users. http://metrics.torproject.org/

users.html54. Ts’o, T.: Password generator. http://sourceforge.net/projects/

pwgen/55. Winter, P., Lindskog, S.: Spoiled onions: exposing malicious tor

exit relays. Technical Report, Karlstad University (2014). URLhttp://veri.nymity.ch/spoiled_onions/techreport.pdf

56. Yuill, J., Zappe, M., Denning, D., Feer, F.: Honeyfiles: deceptivefiles for intrusion detection. In: Proceedings of the 2nd IEEEWork-shop on Information Assurance (WIA), pp. 116–122 (2004)

123