45
User Privacy and the Evolution of Third-party Tracking Mechanisms on the World Wide Web Sonal Mittal May 18, 2010

Sonal Mittal May 18, 2010 - Stanford University

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

User Privacy and the Evolution of Third-party TrackingMechanisms on the World Wide Web

Sonal Mittal

May 18, 2010

Abstract

Third-party tracking refers to tracking done by websites that a user never navigates toexplicitly. Many Internet users are vaguely aware that their information may be collectedonline. However, data suggests there is relatively little knowledge about third-party trackingand its associated privacy risks. The FoxTracks software tool attempts to address this lack ofknowledge about third-party online tracking for the benefit of interested users with varyinglevels of technical knowledge. FoxTracks is a Firefox add-on program that browses theweb along with the user and collects information about three types of trackers that maybe monitoring the user: HTTP cookies, Local Shared Flash Objects, and DOM Storageentries. The interface to FoxTracks displays the user’s information as it has been collectedby the trackers; the highly personalized view of third-party tracking is uniquely accessibleand informative for end-users. Beyond the development of FoxTracks, the analysis presentedin this thesis discusses the history, key players, and motivations of third-party tracking, andhow each influenced the design choices made in the software. In particular, the motivationsof third-party entities, who are frequently online advertisers, are examined in at length. Acomputer security rubric is then applied to the behavior and tracking methodologies of thirdparties in order to show their adversarial qualities in matters of user privacy.

Contents

1 Introduction 2

2 Third-Party Tracking in the Literature and Code 6

3 HTTP Cookies and Web bugs 103.1 The Introduction of State Management Mechanisms . . . . . . . . . . . . . . 103.2 Advertisers and Personally Identifiable Information . . . . . . . . . . . . . . 123.3 HTTP Cookies and Web Bugs in FoxTracks . . . . . . . . . . . . . . . . . . 14

4 Flash Local Shared Objects 174.1 A History of LSOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Corporations and Market Incentives . . . . . . . . . . . . . . . . . . . . . . . 184.3 Flash Cookies in FoxTracks . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 DOM Storage 255.1 Web Storage in the W3C Standard . . . . . . . . . . . . . . . . . . . . . . . 255.2 Case Study: Gmail Mobile Privacy . . . . . . . . . . . . . . . . . . . . . . . 275.3 Community Approach to the Study of DOM Storage . . . . . . . . . . . . . 30

6 Results 336.1 The FoxTracks Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Third Parties as Privacy Adversaries . . . . . . . . . . . . . . . . . . . . . . 35

7 Conclusions 38

Acknowledgements 41

Bibliography 43

1

Chapter 1

Introduction

Protection of online privacy refers to freedom from unwanted interferences with an Internet

user’s digitally stored, personal data. This includes data residing on a user’s local computer,

data transmitted by a user to remote servers, and data that is generated in the process of

browsing websites: mouse strokes, searches, history, and other page inputs. Evolving Internet

protocols and standards have contributed to a strong emphasis on user self-management of

online privacy rather than legal or regulatory restrictions on how remote servers can collect

and use user information. Users’ general lack of knowledge about the transmission and uses

of personal data, combined with little privacy jurisdiction on the Internet, has left many

paths open for the aggregation and use of individuals’ web data without their given consent.

Many non-expert Internet users are vaguely aware that their information and data may be

collected online. However, surveys of users such as Internet-using college students suggest

that individuals know little about the pervasiveness of online tracking and the kinds of

personal information that can be recorded [9][3]. The surveys show there is particularly little

awareness about the data collection and activity logging undertaken by third-party websites.

Third-party websites are entities that a user never visits explicitly; they are juxtaposed with

first-party websites which correspond to URLs that a user enters in a browser address bar.

Because third-party tracking is undertaken surreptitiously by unfamiliar entities, it is less

transparent than tracking done by first-party websites. Users are able to consult the stated

2

privacy policies of first-party websites in order to understand how their data and activity

will be monitored. However, without their identities, a user cannot examine the privacy

policies of third-party websites and opt out of third-party tracking. Personal information

collected in an opaque manner by remote entities reflects an especially serious privacy risk

since digital data can be copied and distributed to additional parties with electronic ease.

To increase awareness about third-party tracking and survey common online tracking

methodologies, I created the FoxTracks software tool. FoxTracks is designed as a Firefox

web browser add-on program and was created using JavaScript and XUL utilities. The ul-

timate goal of the software is to educate regular Internet users about the different kinds of

tracking technologies employed by third parties and demonstrate the great extent to which

their activity and data is monitored by unfamiliar entities. To achieve this aim, I designed

FoxTracks to show how three different tracking technologies personally affect the user as she

browses the web. FoxTracks contains three tab panels, one for each of HTTP cookies, Local

Shared Objects, and DOM Storage. Each panel aims to show the user which third parties

are using the technology to track her online activities and what personal information they

have collected. Each panel also provides links to web pages answering potential questions

about the tracking technology, the types of user information at risk, and available opt-out

mechanisms for the technology. The web content associated with each panel was generated

from my research and reviewed by privacy experts at the Center for Democracy and Tech-

nology (CDT).1 The CDT also kindly hosts the FoxTracks web content on their servers.2

Beyond the FoxTracks development cycle, the analysis presented in this thesis discusses the

history, key players, and motivations of third-party tracking, and how each influenced the

design choices made in the software. In the analysis, I also explore the following research

1CDT is a non-profit public interest organization working at the intersection of law, technology, andpolicy. It is headquartered in Washington, D.C. More information can be found on their website:http://www.cdt.org/

2See http://www.cdt.org/foxtracks/. Because of CDT’s contribution, their logo appears on the FoxTrackstab panels.

3

questions: Does the evolution of tracking technologies suggest an adversarial relationship

between users and third parties, as in the computer security paradigm? To what extent does

the Internet infrastructure (e.g., HTML standards) facilitate third-party tracking? Should

Internet users be comfortable with opaque information collection, and if not, what kinds of

responses are effective?

For Internet users who are interested in learning more about online privacy and the risks

posed by third-party tracking, FoxTracks is an accessible, all-in-one resource. By showing

users the information that specific third parties have collected about them, FoxTracks demon-

strates privacy risks in a novel, highly personalized way. With a better understanding of

third-party tracking, users are able to make more informed decisions about how they browse

the web. As such, FoxTracks has the ability to synchronize user beliefs about digital privacy

with online behavior in a way that closes the information gap suggested by Internet user

surveys. For instance, FoxTracks adopters may adjust their browser settings and browsing

habits based on how they find their information is collected and used by third parties. At

a minimum, users who learn about third-party tracking will continue their current browsing

patterns in a more transparent environment—one in which third parties have the “informed

consent” of users to track their online activities. In this way, FoxTracks plays an important

public role in spreading information about third-party tracking to online content consumers.

The accompanying written analysis on the history and identity of third parties, and the

potential uses of collected user data also has significance for the public discourse on privacy.

Through my research, I find that third parties have developed increasingly advanced tech-

nologies to combat user efforts to restrict access to personal data. Additionally, they have

significant economic motives to track users and are heavily aided by new HTML standards.

Taken together, these findings suggest that third parties act like adversaries to individual

users under the computer security paradigm. It follows that users should take active, and

perhaps organized, steps to restrict third-party activities online.

4

This thesis is organized as follows: Chapter 2 contextualizes my software and analysis

within the existing body of computer science literature on third-party tracking. The following

sections explore the three types of third-party tracking technologies included in the FoxTracks

tool. Chapter 3 is an overview of HTTP cookies and web bugs in relation to third-party

tracking, and Chapter 4 examines how third parties make use of flash object technology.

Chapter 5 explores the new DOM storage feature of the HTML5 standard. Each of these

chapters describes the basic technology, how the technology is modified to serve information

collection purposes, types and potential uses of the information collected by the technology,

and how these facts influenced the design of FoxTracks. Results related to the FoxTracks

software and the proposed research questions are given in Chapter 6. Chapter 6 also reviews

the results by considering some of the limitations of the software. Chapter 7 concludes the

thesis and discusses further directions for research.

5

Chapter 2

Third-Party Tracking in theLiterature and Code

HTTP cookies have been included in the HTML standard since 2000. Since their introduc-

tion, cookies and web bugs, which are functionally similar to cookies, have generated a great

deal of academic interest. Kristol (2001), author of the original cookie standard, was among

the first to give a general overview of how cookie mismanagement could result in serious

security breaches involving information leakage across domains [6]. Kristol acknowledges

the third-party profiling potential of cookies in correctly implemented cookie management

systems and questions whether users are aware of the tracking potential of cookies. Similar

explanations and concerns have been reiterated as tangential points in numerous academic

computer security papers since 2001. Other researchers have taken a code-based approach

to the analysis of HTTP cookies; motivated by a desire for web transparency, they have de-

veloped software programs for viewing and managing first-party and third-party cookies. A

great deal of this research has come from the public sector with individual programmers and

non-profit advocacy groups undertaking software development and publicizing their work

and findings. The extent of this research is evidenced by the myriad cookie management

programs available for download today.

Ghostery, a popular Firefox add-on, is one of the few programs designed to exclusively

identify and block third-party HTTP cookies and the web bugs associated with them. Unlike

6

regular cookie managers, Ghostery strives to give users information about the third parties

attempting to set cookies through first-party websites that users visit. Specifically, it alerts

users about the identities and first-party associations of third-party trackers in real-time

using a menu located in the bottom of the Firefox status bar. The menu also links to more

information about the identified trackers. This informational quality of Ghostery addresses

a deficiency of popular cookie managers—an average Internet user without a precise under-

standing of third-party cookies and web bugs can easily use Ghostery and learn about online

tracking in the process. Though Ghostery is the closest content analog to FoxTracks, it does

not provide a user with a complete picture of how third parties log user activity on the web.

FoxTracks attempts to address this informational deficiency by compiling a history of all

first-party websites on which a given third-party tracker has been found in order to show the

browsing history profile that the tracker compiles.

Local Shared Objects (LSOs) refer to client-side, remotely-accessible storage. Adobe

makes use of LSOs in its popular Flash video player. Much of the interest in LSO im-

plementations has come from the private sector, with various web company white papers

outlining the potential of Adobe’s Flash LSOs as client-side storage bins. LSOs in general

have been examined extensively in systems and networking literature as persistent storage

bins for communication between machines. They have been explored as a mechanism for

executing malicious attacks on host computers, and their third-party tracking potential is

frequently noted in privacy papers. Soltani et al. (2009) were the first to present a holistic

picture of Flash LSO usage and web practices [13]. They conclude that Flash LSOs present

a substantial privacy threat because of their third-party capabilities and their obfuscation

from users, which gives them an especially persistent nature.

As with HTTP cookies, a great deal of LSO privacy analysis has come from the public

sector. Organizations such as the Electronic Privacy Information Center (EPIC) have com-

piled public fact sheets on how third parties use LSOs to track users and many individual

7

developers have created stand-alone and browser add-on programs for Flash LSO manage-

ment. In the Firefox add-on tradition, BetterPrivacy is the most widely used LSO removal

and editing tool. BetterPrivacy lists all the LSOs that currently exist on a user’s machine

and allows a user to select the frequency and timing of LSO deletion. Despite a user-friendly

design, BetterPrivacy adopts the “install and forget” browser add-on model and thus pri-

marily benefits users with an understanding of LSOs and their risks. The add-on may be

less useful for users who do not have a good understanding of first-party and third-party

LSOs, or the nature of the privacy threats that they present. With FoxTracks, my goal was

to build on BetterPrivacy’s control over obfuscated LSOs by providing a visual explanation

of how third parties may set and use LSOs. By illuminating actual LSO tracking of a user’s

web activities, I extend LSO control to a wider, non-technical audience.

Unlike HTTP cookies and LSOs, DOM Storage remains relatively unexplored in the aca-

demic and private sector literature. Where it does appear, it is studied for its efficiency

properties as a remote-access storage space. Between public sector organizations and indi-

vidual web developers, no software tools have been developed to specifically examine DOM

Storage contents or to clarify the exact contents to users. Some web pages hosted by privacy

interest groups like the Electronic Frontier Foundation (EFF) briefly describe the location

of DOM contents without more specific information or instances of DOM Storage use in

user tracking. FoxTracks aims to bring DOM Storage more fully into the privacy discourse

by exposing its contents to users and beginning to clarify its role in third-party tracking.

FoxTracks is thus a unique development in the body of work on DOM Storage.

In sum, there is substantial amount of literature on HTTP cookies, significant work on

LSOs, and relatively little information about the uses of DOM Storage in tracking. While the

existing literature may inform interested users about these technologies, it provides high-level

explanations rather than a personally relevant demonstration of privacy invasion. The latter

is a more accessible and tangible educational experience for users with little background in

8

online privacy. Because current software tools that do show how users are personally af-

fected by tracking require intermediate technological knowledge, I designed FoxTracks to be

relevant and accessible to all Internet users. Surpassing the traditional storage management

model in favor of a personalized approach allows non-expert users to see how tracking figures

actively in their browsing experience. The accessibility of FoxTracks comes from its interface

and its all-in-one nature. Including all three tracking technologies in a single privacy man-

agement tool provides a novel, holistic survey of third-party tracking on the web. Moreover,

each technology represents an advance in the capabilities and persistence of trackers, which

informs users about the evolution and growth of third-party trackers. FoxTracks aims to

increase general knowledge and awareness of third-party tracking through these educational

and accessible qualities.

9

Chapter 3

HTTP Cookies and Web bugs

3.1 The Introduction of State Management Mechanisms

As the complexity of web-applications grew in the late 1990s, the Internet Engineering Task

Force (IETF)1 recognized the value of adopting an HTTP state management mechanism.2

The IETF believed such a mechanism could support virtual shopping carts for e-commerce

and improve the user browsing experience by “remembering” preferences for websites [7].

The state management mechanism adopted was the HTTP cookie (cookie). Cookies are

small pieces of text that servers can set and read from a client computer in order to register

its “state.” They have strictly specified structures and can contain no more than 4 KB of

data each. When a user navigates to a particular domain, the domain may call a script

to set a cookie on the user’s machine. The browser will send this cookie in all subsequent

communication between the client and the server until the cookie expires or is reset by the

server.

As predicted by the IETF, cookies have been used to improve the functionality of many

websites. For example, they have been used to implement online shopping carts, cache data

1The IETF is an open standards organization that works with similar groups to propose and reviewInternet standards.

2HTTP refers to the HyperText Transfer Protocol, which governs how requests are sent over the Internet.These requests are stateless; in other words they do not carry any configuration information about thesystems exchanging requests.

10

form values, personalize website views, and transmit user authentication credentials [11].

Such use of cookies improves the user browsing experience and in turn benefits websites who

receive more visitors. However, cookies can also compromise user privacy in many ways. At

the time of adoption, the IETF described the cookie’s potential for cross-domain information

exchange, a particularly serious threat to user privacy. The following text appears under the

header of “Unexpected Cookie Sharing” in the IETF’s Request for Comment (RFC) 2965

document explaining the new cookie standard:

A user agent should make every attempt to prevent the sharing of session infor-

mation between hosts that are in different domains. Embedded or inlined objects

may cause particularly severe privacy problems if they can be used to share cook-

ies between disparate hosts. For example, a malicious server could embed cookie

information for host a.com in a URI for a CGI on host b.com.3 User agent im-

plementors are strongly encouraged to prevent this sort of exchange whenever

possible.

Users can navigate to webpages that load content such as images or advertisements from

third-party servers. Because a third-party server establishes a connection to a user’s machine

when its contents are loaded on a first-party website, the third party is able to set a cookie

on the user’s machine. Cookies set by these third parties have the potential to track a user’s

browsing habits. To see how this is possible, consider an image that is stored on a.com’s

servers and loaded on two websites: b.com and c.com. If a user navigates to b.com, a.com

can set a cookie containing a unique alphanumeric string on the user’s machine and associate

b.com with that string somewhere on its own servers. When the user next navigates to c.com,

a.com will read the cookie it previously set on the user’s machine. It can then recognize

the unique string contained in the cookie and associate c.com with the string. a.com now

3URIs and CRIs are placeholders for content that exists outside the immediate context.

11

has a small profile of the user’s browsing habits and can grow this profile along the order of

the number of websites that host its content. Thus, agents interested in tracking users are

able to exploit the cookie state management mechanism to capture users’ browsing habits

without their knowledge or consent.

Web bugs are functionally similar to cookies set by third parties. Web bugs affiliated

with particular third parties are embedded objects loaded from third-party servers that are

invisible to users. Unlike third-party cookies, they do not set any data on a user’s computer.

Rather, they collect a user’s IP address, browser type, the current first-party URL, and read

any unique-string cookies that have been set by the third party in the past. If such a cookie

is found, the server is able to augment its profile of the user with the current first-party

URL. In this way, web bugs can be used in conjunction with third-party cookies to facilitate

user tracking.

3.2 Advertisers and Personally Identifiable Informa-

tion

The previous section explains how third-party cookies can be used to track a user’s browsing

habits. However, understanding the full privacy consequences of third-party cookie tracking

requires an examination of third parties and the anonymity of user browsing profiles. I used

the Ghostery Firefox add-on to examine which organizations are carrying out third-party

tracking using cookies and web bugs. Ghostery identifies third-party trackers on a given web

page by first searching the underlying HTML of a web page for script tags. Content loaded

from a domain different than that of the primary URL must be loaded though the script

syntax. Once Ghostery has acquired all objects associated with script tags, it compares the

objects to a database of known third-party trackers in order to determine whether any of the

scripts loaded third-party trackers. This database is the most comprehensive list of known

third-party trackers that make use of cookies and web bugs. It includes over 200 trackers

12

Table 3.1: Some third-party trackers contained in the Ghostery database.

Tracker NameGoogle AnalyticsQuantcastSiteMeterOmnitureFacebook ConnectGoogle AdsenseDoubleclickTacodaWebTrendsAddThisRevenue Science

and a small sample is given in Table 3.1.

The tracking agents listed in Table 3.1 are primarily ad networks and behavioral data

providers. Ad networks connect advertisers who want to reach potential customers with

sites who want to sell advertisement space. This business model allows ad networks reach a

wide spectrum of small and medium-sized websites interested in taking on advertisements.

In 2009, 30% of the $8 billion spent on online advertising went to ad networks rather than

direct websites selling advertising space [5]. One advantage ad networks have over individ-

ual websites selling advertising space is the ability to display the same ad across multiple

websites that a single user visits. This advantage attracts advertisers who desire multiple

ad impressions per user [10]. Thus, being able to accurately track user browsing habits has

significant business consequences for ad networks. It follows that ad networks have a strong

incentive to use third-party cookies and web bugs to track users across as many sites as pos-

sible. Like ad networks, behavioral data providers aim to track and organize user browsing

patterns. However, behavioral data providers are solely in the business of stratifying users

for receiving targeted advertisements and not involved with the advertising process itself.

Such companies work with websites or ad networks to suggest relevant ads for different users

13

based on browsing history data collected through third-party cookie and web bugs. For

both types of companies, more user tracking yields more user data points that translate to

improved business.

Targeted advertising datasets containing users’ browsing habits can be augmented by

precise demographic data. Though the primary use of third-party tracking cookies and

web bugs is the aggregation of a user’s browsing history, cookies can also be used determine

demographic information about the user. Cookies and web bugs have access to primary page

URLs, which may leak pieces of personal data such as login name or data form information.

Third parties may process this information and associate it with the browsing profile and

unique string for the user [12]. This association is a serious threat to user privacy because

it may de-anonymize the browsing history profile that was otherwise only connected to an

alphanumeric string. The browsing profile newly associated with a specific person and/ or

her demographic information might be sold or publicized at the discretion of a tracking

company, resulting in a serious breach of user privacy.

3.3 HTTP Cookies and Web Bugs in FoxTracks

Following the Ghostery add-on model, FoxTracks works by reading in the underlying HTML

of a web page on loading and searching for script tags which are required when content

is loaded from another domain. All lines containing scripts are then examined to see if

they contain an external source and if that source can be identified as a third-party tracker.

Specifically, the domain name of the source is checked against the Ghostery database con-

taining information about 200 known trackers. The Ghostery database is included as a

raw file in the FoxTracks add-on. I have kept the original Ghostery database format and

methodology to maximize compatibility with simultaneous use of Ghostery. Each “entry”

in the file is specified as a {tracker name, tracker search pattern} pair. The search patterns

14

were determined by the Ghostery development team, which collects new third-party tracker

submissions from Ghostery users at large.

After verifying the presence of a third-party tracker, FoxTracks returns the name of

the tracker and associates it with the first-party website on which it was found. FoxTracks

manages a SQLite database to hold these associations. The database called trackerBase.sqlite

is updated with a {third-party tracker, first-party website} pair whenever a third party is

identified on a page. The HTTP cookies and web bugs tab in FoxTracks provides a table

view of trackerBase.sqlite, which features two columns, “Tracker” and “Origin” (see Figure

3.1). An end-user can resort the table by column. The tracker-based sort displays all entries

containing the same tracker consecutively; this view provides the user with a snapshot of the

personal browsing profiles different trackers have compiled. These profiles are consistent with

the browsing profiles that each tracker stores on its servers. The origin-based sort provides

a history of all the third-party cookies and web bugs that have ever tracked the user on a

particular website that the user has visited. This information is useful for users who may

want to adjust their website usage based on concerns about third-party tracking.

To understand the FoxTracks interface, users require contextual information about cook-

ies and web bugs. Thus, FoxTracks provides links to web content addressing general questions

about cookies and web bugs, which third parties use these technologies, and what kinds of

information third parties can collect. The software also provides links to information about

opting out of third-party tracking with cookies and web bugs. Specifically, FoxTracks links to

instructions for blocking third-party cookies using built-in browser settings. It also links to

the latest version of Ghostery, which provides a mechanism for blocking web bug activity. By

providing informational links as well as a personalized demonstration of third-party cookie

and web bug tracking, FoxTracks engages and informs users about online privacy risks.

15

Figure 3.1: Screenshot of the FoxTracks HTTP Cookies and Web Bugs panel.

16

Chapter 4

Flash Local Shared Objects

4.1 A History of LSOs

Local Shared Objects are a class of remotely-accessible, client-side storage bins. Flash LSOs

were first used to store settings preferences in Macromedia’s Flash Player 6 in 2002. They

have been included in every subsequent version of the flash player, from Macromedia Flash

Player 7 to Adobe Flash Player 10 (Macromedia was acquired by Adobe in 2005). When

a Flash application is loaded on a page, a website is able to set an associated Flash LSO

without prompting the user for permission. These LSOs are formatted as .sol files and can

hold up to 100 KB of data. Additionally, they do not have an expiration date and are

located in a single system folder that is available to all users and browsers on a machine.

These characteristics of LSOs suggest they are more persistent data stores than HTTP

cookies. Greater persistence and storage size offer a number benefits as users consume more

data intensive web content like streaming music and video. LSOs are able to improve media

playback by storing video preferences or caching large amounts of data that would otherwise

have to be repeatedly retrieved from servers.

However, this technology can also be detrimental to user privacy. Adobe Flash is a

standalone program that is independent from the browser. Most browsers, including Firefox,

do not provide any control mechanisms over the setting and accessing of Flash LSOs, nor do

17

they prompt the user for permission to interact with Flash LSOs. Furthermore, Flash-based

applications on a given web page may not be visible to the user. It follows that users who are

unaware of LSOs have no control over the setting of LSOs on their machine. These concerns

are aggravated by the wide variety of information that can be stored in LSOs. According to

a Macromedia whitepaper on LSOs, the type of information that can be contained in .sol file

is limited only by the information to which the Flash application has access. This includes

any content in the Flash application file, information that the user provides to the website or

the Flash application, configuration information about the users machine for video content

playback, and other LSOs associated with the same domain [4].

Flash LSOs can also be used for third-party tracking purposes. In 2005, United Vir-

tualities, an online advertising company, published a statement on the use of LSOs in an

online environment with increased user awareness and deletion of third-party HTTP cookies

[15]. Like third-party HTTP cookies, third-party LSOs with unique identifying strings can

be loaded through first-party websites. These third-party LSOs can then be used to compile

an enhanced browsing profile of an individual who navigates to multiple websites that load

content from the third party. Because this tracking methodology is very similar to that of

third-party HTTP cookies, LSOs are also known as “Flash cookies.” There has been little

work on identifying how and when Flash cookies are set on a user’s machine. Without this

kind of information, it is difficult to discern first-party Flash cookies from third-party Flash

cookies set by companies for tracking purposes.

4.2 Corporations and Market Incentives

Soltani et al. addressed the lack of Flash cookie data by using survey techniques to find out

which websites regularly employed first-party and third-party Flash cookies. They surveyed

the 100 most-visited websites (as of July 2009) and found that 54 sites set a total of 157

18

Local Shared Objects that produced 281 Flash cookies. 31 of these sites also marked their

flash cookies with a unique identifying string that matched a unique identifier contained in

an HTTP cookie set by the same site. Upon investigation, Soltani et al. found that when

the corresponding HTTP cookies were deleted, a new HTTP cookie set by the website would

contain the same identifier. This behavior suggests that Flash cookies actually “respawn”

deleted HTTP cookies [13]. While their research doesn’t explore the possibility of browser

setting uniqueness that would allow identification by a website, further evidence of cookie

respawning is given by the United Virtualities statement on the use of Flash cookies to

defend against cookie deletion. In a March 2005 statement, the company wrote,

All advertisers, websites and networks use [HTTP] cookies for targeted advertis-

ing, but cookies are under attack. . . . [We] developed a backup ID system for

cookies set by web sites, ad networks and advertisers, but increasingly deleted

by users. UV’s ‘Persistent Identification Element’ (PIE) is tagged to the user’s

browser, providing each with a unique ID just like traditional cookie coding.

However, PIEs cannot be deleted by any commercially available anti-spyware,

mal-ware, or adware removal program. They will even function at the default

security setting for Internet Explorer.

Of the 31 domains with Flash cookies that respawned HTTP cookies, Soltani et al. iden-

tified eight as advertising companies and four as first-party domains. The eight advertisers

in Table 4.1 constitute the only definitive list of third parties known to use Flash cookies in

a way that intentionally circumvents user efforts to delete HTTP cookies. Others may be

found by searching personal LSO collections, but this approach to identifying third parties

is subject to scrutiny by the web community at large. Of the companies listed in Table 4.1,

many publicly disclose their ability to collect large quantities of highly specific user infor-

mation such as zip code and income bracket. VideoEgg alone has a 100 million-person user

base through its distribution across 500 websites [2]. These advertisers have incentives to

19

Table 4.1: Companies using Flash cookies that respawn HTTP cookies.

Company NameClearSpringIesnareInterClickScanScoutSpecificClickQuantCastVideoEggVizu

override user steps to protect privacy as outlined in Section 3.2. The tractable number of

advertisers known to use third-party Flash cookies also allowed me to examine more specific

industry incentives to ignore concerns about user privacy on the web. Public records of

venture funding show that three of the private advertisers in Table 4.1—ClearSpring Tech-

nologies, Quancast, VideoEgg—have received over $110 million in venture capital funding

from 2005 to 2010 [2]. Other advertisers have been recently honored with accolades such as

“a top 10 most innovative company.” This kind of monetary and industry support suggests

that these companies are rewarded for intrusions into user privacy. It also suggests they face

little to no opposition from organized web users or other interest groups that could weaken

their business model by preventing tracking or inducing concern among venture funders. The

lack of concern for user privacy demonstrated by funders and industry reinforces the need

for an educational tool that increases awareness of tracking with Flash cookies.

4.3 Flash Cookies in FoxTracks

While third-party tracking is of particular research interest due its intentionally obfuscated

nature, the difficulty in determining when Flash cookies are set and accessed prevented me

from focusing solely on third-party Flash cookies in FoxTracks. The current method of Flash

20

cookie access detection is an examination of the local LSO folder for new LSOs and changes

in last-access timestamps on every page load [8]. This method results in noticeable browser

slow-down when the folder size is large and when significant numbers of Flash cookies are

being accessed on a single page. Significant browser latency is a disincentive to add-on usage,

so I chose not to use this method. I have concluded that an ideal model for third-party

Flash cookie detection would parallel the Ghostery method of finding third-party HTTP

cookies and web bugs: scanning the HTML of a page for script tags and comparing the

commands contained within them to strings naming known third-party trackers. While the

eight companies identified by Soltani et al. constitute the beginnings of database of known

third-party Flash cookie trackers, I intend to use the Ghostery model of community-based

input and review to compile a larger database for inclusion in later development. Once this

is a strong resource, FoxTracks can display companies that use third-party Flash cookies and

how they have personally tracked the user over time.

In order to gather community input, FoxTracks must first be adopted by a user base. In

this version of FoxTracks, I have opted to include “view and delete” interface into a machine’s

LSO folders (see Figure 4.1). My interface accesses and lists all Flash cookies on a user’s

machine in table format. Information displayed about each flash cookie includes origin, i.e.,

with which domain the object is affiliated; name, e.g., “settings.sol;” size in bytes; and the

date and time a specific cookie was last accessed by a website the user visited. The interface

also includes information about the location of the LSO folder on the user’s machine and

buttons to delete the listed Flash cookies individually or altogether. The origin and name

information is generally sufficient to understand the owner and purpose of a particular Flash

cookie. When a user is aiming to delete tracking Flash cookies and maintain preferences for

various websites stored in other Flash cookies, the origin and name can be used to decide

whether a particular cookie should be deleted or not. The size and latest access time might

also provide insight into the quantity and frequency of information collection by websites

21

the user has never visited explicitly. The view generated in FoxTracks resembles a simplified

version of the functionality in the most popular Firefox Flash cookie add-on, BetterPrivacy.

Figure 4.1: Screenshot of the FoxTracks Flash Objects panel.

A version of BetterPrivacy’s automatic deletion feature is included in the advanced op-

tions pane of FoxTracks, which is shown in Figure 4.2. These options have been separated

from the main Flash objects tab in order to simplify the tool and thereby further its edu-

cational and informational goals. The options for automatic Flash cookie deletion are more

restricted than those offered by BetterPrivacy, which functions as an “install and forget”

add-on for users with an intermediate understanding of Flash cookies. FoxTracks allows

the user to select between complete deletion at every session ending, timer-based deletion of

infrequently-accessed Flash cookies, and adding an option to clear Flash cookies to the built-

in Firefox “Clear Recent History” dialog box. Other advanced options include clearing the

Adobe Flashplayer settings LSO that contains playback preferences in addition to a history

22

of all visited websites that use Flash, and clearing empty folders left over from deleted .sol

files. By leaving out some BetterPrivacy functionality such as a “white-list” of perpetually

allowable Flash cookies, and limiting options for automatic Flash cookie deletion, FoxTracks

aims to focus the user’s attention on the origins and purposes of the LSOs that have been

set on her machine.

Figure 4.2: Screenshot of the FoxTracks options dialog box.

Like the HTTP cookies and web bugs tab, the Flash objects tab also includes a sidebar

with informational links to relevant web content. These include descriptions of what Flash

cookies are, which known organizations are using them to track user behavior on their own

websites or across other websites, the types of information that can be gleaned about a user

through the use of Flash cookies, and how to opt out of being tracked by Flash cookies, either

by deleting them or managing them centrally through the Adobe website. Instead of the

deletion-blocking approach to controlling Flash cookies, users may use a formal blocking and

23

storage limitation scheme to stop tracking. Adobe’s website provides a Global Settings panel

that allows users to block all third-party Flash cookies and/ or set storage size capacities

for all Flash cookies. Research done by the Electronic Frontier Foundation on surveillance

technologies suggests the former option may seriously impair some websites’ functionality

and recommends the latter approach, setting all storage capacities to zero. However, this

may result in loss of settings preferences information for first-party websites in exchange for

removing all tracking possibility. The FoxTracks web reference for using Adobe’s central

LSO manager describes the options available to users in full.

24

Chapter 5

DOM Storage

5.1 Web Storage in the W3C Standard

The third and final storage-based tracking technology I examined in the course of my research

was DOM Storage. DOM Storage is proposed as an improved state management mechanism

in working drafts of the HTML 5 standard that is set to be adopted by the World Wide

Web Consortium (W3C)1 in late 2010. As of December 2009, DOM Storage specifications

have been spun off into a distinct working document entitled “Web Storage” for independent

review and adoption. Though it is only recently that DOM Storage is being considered as

a formal Internet standard, popular web browsers have included DOM Storage capabilities

since 2006. Notably, DOM Storage space was first included in Firefox 2.0 and has been

supported through the current version, Firefox 3.6 [1]. Despite its pending W3C adoption,

it is also included in the latest versions of Safari, Internet Explorer, Chrome, and Opera, all

of which were released between 2008 and 2009.

DOM refers to the legacy term “document object model,” and serves little purpose in

describing this browser storage space. Like HTTP cookies, DOM storage is a mechanism

for maintaining a user’s state with a particular website. It is designed as a large storage

bin that exists locally on a client’s machine. According to the W3C working draft on DOM

1The W3C is a standards organization like the IETF.

25

Storage, the mechanism offers two benefits over regular cookies. First, it prevents race

conditions that can occur during simultaneous browsing sessions. For instance, when two

browser windows navigate to the same site, cookie data that is transmitted in each session

may get overwritten or aggregated in a way that results in unexpected behavior. The W3C

specification solves this problem by providing a single session storage space for each brows-

ing session. This space will only ever be accessed by one window and thus prevents state

confusion from multiple connections to the same domain. Additionally, all session-only data

will be discarded on window close or browser exit, so no conflicts will manifest under this

model. The second advantage of DOM Storage is its much larger size than regular cookies.

Allowing for megabytes of persistent storage on the client-side of communication allows for

website performance enhancements in the way of a large cache.

While it offers some advantages over HTTP cookies, DOM Storage presents the same

third-party tracking risks as regular cookies. Additionally, the collection of highly specific

user data kept in DOM Storage increases the seriousness of any privacy intrusions by third

parties. The W3C is conscious of these user privacy concerns posed by DOM Storage adop-

tion:

A third-party advertiser (or any entity capable of getting content distributed to

multiple sites) could use a unique identifier stored in its local storage area to track

a user across multiple sessions, building a profile of the user’s interests to allow

for highly targeted advertising. In conjunction with a site that is aware of the

user’s real identity (for example an e-commerce site that requires authenticated

credentials), this could allow oppressive groups to target individuals with greater

accuracy than in a world with purely anonymous Web usage.

Like RFC 2965, the DOM Storage standard promotes user agent’s role in protecting

privacy. User agents are given the following suggestions: blocking third-party storage, ex-

26

piring stored data, treating persistent storage like regular cookies, tracking the origins of

stored data and creating a blacklist or whitelist of websites accordingly. These suggested

approaches to user privacy are unsatisfying for several reasons. First, engaging in any of

these defenses requires substantial knowledge of session-only and persistent data stores. A

user would need an intermediate understanding of state management mechanisms both at

a high level and on a per website basis in order to determine whether DOM storage was

being used benignly or maliciously. Many users browse the web unaware of DOM Storage

and other state mechanisms with tracking potential. It follows that users lack to knowledge

to manage them effectively. Secondly, presuming user understanding of DOM Storage, the

standard does not propose an API or technical implementation of the suggested defenses.

Rather, a user would need the technical expertise to implement a DOM Storage settings

controller in order to realize many of these defenses. Finally, the document motions to ex-

cuse concerns about user privacy by referencing the futile nature of privacy protection. It

suggests that a first-party domain may track user activity and later sell it to a third-party, or

that session-identifying data passed through URLs may be analyzed for user data regardless

of any privacy protections that are in place. Thus DOM Storage poses unaddressed risks to

user privacy.

5.2 Case Study: Gmail Mobile Privacy

Mobile versions of the major web browsers also support the HTML5 standard for local

database storage. Persistent offline client-side storage is especially advantageous for mobile

websites2 which frequently face limited bandwidth and inconsistent network connectivity.

This is because keeping large amounts of data on the client device requires fewer requests

for bandwidth-intensive data over a sporadic network connection. As a result, many mobile

2Here, “mobile websites” refers to mobile versions of regular websites.

27

websites have been implemented using the HTML5 standard and local database storage.

The Gmail website for the Apple iPhone is one such mobile website, and it provides an

interesting case study in DOM/ local storage risks to user privacy.

To see how the Gmail mobile website makes use of local database storage, I needed to

examine the underlying program folders of the iPhone web browser. However, the Safari for

iPhone folder contents cannot be examined on the iPhone itself because the device’s system

folders are locked to users. Thus, I chose to mimic Safari for iPhone using Safari for Mac on

the standard Mac OS. This required a simple change to the Safari developer view and iPhone

user agent context. Logging into gmail.com in Safari for iPhone mode had the following re-

sult: a folder titled “Databases” was silently created within the Safari program folder on

the Mac OS. Within this folder, a management database called “Databases.db” was created

along with a second folder containing storage databases for the domain “mail.google.com.”

In this simulation, though Gmail accessed and wrote to the mobile device, the user was never

prompted for permission or notified of this activity. Along with the privacy concerns de-

scribed below, this local storage creation underscores the failure to achieve informed consent

for tracking under the current privacy paradigm.

The database created within the mail.google.com folder corresponded to my Google pro-

file, and was populated entirely in plain text. Without any kind of encryption or access

security, the database could be opened with a regular SQLite browsing tool. After logging

out of gmail.com and locally opening the database associated with my profile, I was able

to read highly detailed information about the contents of my Gmail account. In particular,

the cached messages and cached conversation headers tables exposed an alarming amount

of personal information (see Figure 5.1). Together, these tables provide information about

frequent contacts, contacts’ addresses, email subject lines, and message contents snippets.

These data may be gleaned for further information such as site login names and passwords.

As an example, I was able to retrieve a site password from a cached message snippet asso-

28

ciated with a password reset email in my inbox.

Figure 5.1: The cached conversation headers table in my profile database.

This storage mechanism presents a host of privacy concerns. Though the database files are

not visible to other end-users through the iPhone interface, other mobile websites and third-

party advertisers may use and exploit the same local storage area. The W3C working draft

on web storage suggests that a user should restrict access to local storage databases to only

scripts originating from the top-level website to which they navigate. However, where users

lack knowledge about DOM storage, this defense is difficult to implement. Domains may

take steps to privatize their local storage databases by using encryption or other techniques.

However, Gmail’s mobile website suggests that at least some websites storing highly personal

data do not obscure that data from third parties. Moreover, the W3C document suggests

that third-party hosts may use fake domain names in order to gain access to the local storage

databases set by the domain name. Without any kind of host authentication, this could lead

29

to information leakage or information spoofing activity, both of which can compromise the

confidentiality of user data. In this example, information leakage might occur if an advertiser

read and saved any of the mail.google.com database information available in the Databases

folder. Information spoofing refers to the writing of data in another domain’s local storage.

Here, a third party might set a user’s Gmail mobile session identifier to a known value and

use this to track the user’s interaction with Gmail.

Though this example illustrated the use of DOM storage by mobile websites, the same

features of HTML5 are available for use by regular web browsers. Non-mobile websites

may choose to make use of local storage in a similar manner to Gmail’s mobile website as

DOM Storage is adopted as an Internet standard. Should websites and users fail to protect

access to locally stored databases, third parties may be able to use DOM Storage to connect

browsing history with many kinds of personally identifiable information.

5.3 Community Approach to the Study of DOM Stor-

age

FoxTracks aims to be informational with regard to user privacy threats posed by each third-

party tracking technology. For HTTP cookies and LSOs, I designed interfaces that are

informative and displayed databases and trackers in a way that minimizes confusion. DOM

Storage tracking potential is substantially more difficult to convey using a Firefox extension.

Despite the inclusion of DOM Storage in Firefox 2.0, no add-ons have been developed to

explore its session-only or persistent storage. BetterPrivacy features a boolean option for

clearing DOM contents on browser exit but does not provide a comprehensive view of the

contents or explain how DOM Storage is used by websites.

Though no add-ons have been developed exclusively for viewing DOM Storage contents,

Mozilla’s developer pages highlight that all persistent data resides in “webappsstore.sqlite,”

a single database inside the Firefox user’s profile folder. The FoxTracks interface loads this

30

database into a table view. However, the database entries are frequently obscure and only

in specific instances will the originating website and other information be intelligible. In

particular, each entry in the database consists of the following fields: scope, key, value,

secure, and owner. Secure is simple a boolean value related to accessibility of the database

entry. Scope and owner refer to the originating website which may be masked or non-

obvious. The key is scope-specific and its significance is not always immediately clear to the

user. The value field is the main storage space of the database entry and may contain user

data that is in human readable form. It may also store scripts that can be accessed and

run from the originating websites. Risks to user privacy can only be demonstrated when

entries’ originating websites and value stores are understandable. Thus, presenting an entire

database view of webappstore.sqlite is not the most effective demonstration of risks to user

privacy posed by DOM Storage. It has the potential to confuse users who may recognize

only some originating websites and certain pieces of data contained in entries. Moreover, the

database view says nothing about the information leakage and information spoofing potential

of DOM Storage contents (see Figure 5.2).

If database entries could be linked to known third-party companies and augmented with

this information, the FoxTracks DOM Storage tab might be more effective. As with third-

party LSO discovery, this improvement requires a reliable, substantial resource for associating

third parties with the names of their DOM entries and the scripts they use to set DOM

entries. To further this end, I intend to work with the technologists at the Center for

Democracy and Technology to begin a DOM contents exploratory project. Following the

Ghostery model for third-party cookie discovery, we intend to uncover third-party DOM

Storage usage by applying a community-based approach. Interested users will be able to

anonymously submit their DOM content for review. This DOM content can be analyzed for

obscure-origin database entries that occur most frequently, and the first-party website DOM

entries with which they tend to appear. Additionally, users will be able submit comments

31

about perceived uses of key and value fields for particular websites’ entries. A critical mass

of comments can then be peer-reviewed and facts about popular websites’ uses of DOM

Storage can be posted in a central location. A link to this location will eventually be

accessible through the DOM Storage tab in FoxTracks. With support from the privacy

experts and technologists at CDT, this community-based solution to acquiring, analyzing,

and spreading information about DOM Storage and its role in third-party tracking will lead

to more effective interface design in future versions of the FoxTracks tool.

Figure 5.2: Screenshot of the FoxTracks DOM Storage panel.

32

Chapter 6

Results

6.1 The FoxTracks Implementation

FoxTracks demonstrates how third-party HTTP cookies, Flash cookies, and DOM Storage

contents can adversely affect the privacy of an end-user. FoxTracks relies on a Ghostery

database of known trackers to identify third-party HTTP cookies loaded through the HTML

of a web page. As a Firefox add-on, FoxTracks has access to the first-party domain a user is

visiting when a third-party script attempts to get or set data on the user’s machine. Every

time third-party HTTP cookie activity is recognized on a page, FoxTracks keeps a record of

the third party and the website on which it appeared. When a user opens the tool, a XUL-

generated interface populates a table with all of these records; and a user is given insight into

the profiles different trackers have assembled from her browsing activity. These “snapshots”

of partial browsing history are identical to the browsing profiles kept on the servers of third

parties. They are completely independent from the user-controlled browser history and, as

such, demonstrate a loss of privacy control to the end-user. In this way, FoxTracks achieves

its aim to inform users about the privacy risks of third-party HTTP cookies.

The identities of third parties that use HTTP cookies, the information collected by cook-

ies, and even the scripts used to set cookies are well-documented in the public domain. On

the other hand, the third-party risks of Flash cookies and DOM Storage exist largely as hypo-

33

thetical information leakages that are periodically supported by specific instances of privacy

invasion. It was difficult to show how third parties use Flash cookies and DOM Storage

in FoxTracks without resources like the Ghostery advertiser database for these technologies.

For this reason, I chose to implement more general Flash cookie and DOM Storage interfaces

and place greater emphasis on the accessibility of these interfaces. FoxTracks provides a file

view of all Flash cookies on a user’s machine. In most cases a Flash cookie’s origin and

purpose is discernible from metadata fields. While these fields do not distinguish first-party

Flash cookies from third-party Flash cookies, users are likely to recognize origin domains

they have never explicitly visited. In this way, even a window into all Flash cookies files can

expose third-party tracking with Flash objects in a manner that is personally relevant to the

user. FoxTracks also provides single-object and all-object deletion options with the aim of

encouraging users to browse the informational web links embedded in the interface prior to

use. The user-friendliness of the Flash objects interface is also increased by the extraction

of advanced deletion methods to a separate options menu.

The DOM Storage tab of FoxTracks also places an emphasis on user accessibility. How-

ever, because DOM Storage takes the form of a SQLite database in the Firefox web browser,

a display of its contents is only partially telling for users. When origin fields are readable,

users may find database entries that have been set by third parties. When storage con-

tents and origin names are only readable by remote servers, FoxTracks is not as effective

in informing users about third-party tracking with DOM Storage or the associated privacy

risks. Nonetheless, an overview of DOM Storage is a valuable addition to the software. By

including all three tracking technologies, FoxTracks achieves an all-in-one overview of track-

ing practices on the web. For interested users, an all-in-one resource provides a holistic,

straightforward introduction to online privacy.

While FoxTracks succeeds in being educational and informative, it stands to benefit from

a number of code-based improvements. The FoxTracks interface was implemented in XUL

34

and program functionality was added through JavaScript functions included in the standard

Mozilla Firefox development API. Many of the functions called in the software have single-

threaded and multi-threaded implementations. To avoid increases in program complexity,

multi-threaded functions were not used in the initial development of FoxTracks. However, use

of multi-threaded database functions would significantly improve the performance of both the

HTTP cookie and DOM Storage tabs by allowing SQLite queries that load database entries

into tables to be executed in parallel. While this optimization is secondary to the goals

of FoxTracks, slow program execution impacts the user’s browsing experience in a negative

manner. If FoxTracks suffers from serious latency or prevents the user from browsing the

web at regular pace, users are unlikely to keep or use the add-on. It follows that future

development of FoxTracks should consider performance improvements.

The most prominent limitation of FoxTracks is its inability to provide information solely

on third-party Flash cookies and DOM Storage contents. As discussed in Section 5.3, Fox-

Tracks would benefit enormously from community-based input and research. Information

about third parties known to use these technologies and the scripts that set them would

provide a basis for identifying additional third-party trackers on the web. Though I plan

to work with CDT to address the lack of information about DOM Storage, such research

might also be carried out in an academic setting or by other public interest organizations.

Interested users that analyze their DOM contents for third-party activity and share their

results can also further the development of informative software tools.

6.2 Third Parties as Privacy Adversaries

Third parties that set and use HTTP cookies and Flash cookies for tracking purposes are

primarily players in the online advertising business. Many of these companies aggregate user

browsing data to serve relevant advertisements to users. Within the privacy discourse, there

35

is debate about the merits of this “behavioral advertising.” Some claim that behavioral

advertisers provide useful content for online consumers. Others cite the privacy-eroding

tracking methodologies of behavioral advertisers. I apply a computer security framework to

the three tracking technologies discussed in this thesis to show how third-party advertisers

might be considered “adversaries” to Internet users.

In the computer security literature, an adversary is an entity whose aim is to prevent

users of a cryptosystem1 from achieving a goal such as data confidentiality or integrity.

Adversaries’ actions typically include attempts to uncover secret data, corrupt data, spoof

communication messages and message sender identities, and force system failures [14]. The

concept of an adversary is used to reason about cryptosystems as “games” between users and

coordinated attackers. Web browsing can be considered a game between an Internet user

and the websites she visits, where data passed to these websites is intended to be private.

In this game, HTTP cookies used by third parties behave like passive adversaries in formal

cryptosystems. Specifically, HTTP cookies observe and record sessions between a user and

first-party website, and use this information to glean facts about the user. Third parties using

Flash cookies and DOM Storage may also behave like passive adversaries but have immense

potential to be active adversaries that spoof, corrupt, and divert communication between

users and first-party websites. Flash cookies in particular have been found to respawn HTTP

cookies, which constitutes a type of message spoofing since a user’s machine establishes an

HTTP cookie communication channel with a third-party server where none should exist.

Both third-party Flash cookies and DOM Storage contents have the ability to intercept and

fabricate users’ communications with first-party websites resulting in information leakage

and information spoofing as described in the W3C web storage standard. This kind of

action typifies active adversaries as they described in the computer security paradigm.

In sum, the use of HTTP cookies, Flash cookies, and DOM Storage by third parties can

1A cryptosystem is any computer system that involves cryptography techniques.

36

be translated precisely into a computer security context. Within this context, tracking tech-

nologies represent means by which a third party attempts break data confidentiality between

a user and the first-party websites they visit. This allows me to characterize third parties

as adversaries in the scheme of online privacy. Moreover, my research provides peripheral

evidence of the actively invasive nature of third-party advertisers. Public comments such as

the United Virtualities statement on the tracking potential of Flash LSOs, demonstrate how

advertisers as a whole have tried to circumvent user attempts to control privacy. Business

successes of advertisers and data aggregators that use Flash cookies also demonstrate the

economic incentives in place for third parties that battle user control of privacy. The char-

acterization of third parties as adversaries to individual users reaffirms the need for greater

user awareness of online tracking practices and a privacy baseline of informed consent.

37

Chapter 7

Conclusions

I implemented the FoxTracks software tool to increase awareness about third-party track-

ing and survey common tracking methodologies. FoxTracks was designed to be accessible

and informational for average Internet users who have incomplete knowledge of third-party

tracking according to survey data. FoxTracks examines the roles of HTTP cookies, LSOs,

and DOM Storage in third-party tracking activities. For each technology, the tool provides

information about the identities of third parties, how they use the technology to undertake

tracking, what kinds of personal data can be exposed, and how users can opt out of tracking.

Based on the existing literature and code for each tracking technology, FoxTracks provides

these pieces of information through different interface implementations. The HTTP cookies

and web bugs panel demonstrates how a third party tracks a user across multiple websites

to compile a profile of the user’s browsing history. The Flash objects panel displays both

first-party and third-party LSOs, and emphasizes learning about Flash cookies prior to us-

ing FoxTracks deletion options. The DOM Storage panel provides a basic database view

into the user’s DOM Storage and strongly emphasizes interaction with the informational

links included in the interface. Together, the three panels provide a novel, holistic survey of

third-party tracking on the web.

38

The design choices in FoxTracks were informed by my analysis of the history, key players,

and motivations of third-party trackers. A number of significant patterns emerged from this

analysis. In both the HTTP cookie standard and the DOM Storage working document,

standards organizations highlighted the serious privacy risks posed by third-party use of

the technologies. Nonetheless, both documents place a strong emphasis on active user self-

management of privacy. The current self-management privacy default, combined with a

lack of user awareness about privacy risks, provides the basis for substantial tracking by

third parties. Between the Ghostery database of known third parties and an examination

of the companies using Flash cookies, the majority of third parties appear to be advertising

companies and behavioral data aggregators. Because their business models rely on large

amounts of accurate user profiling, these companies have economic incentives to circumvent

user attempts to control privacy. Moreover, venture funding and industry recognition shows

they are encouraged to continue privacy-eroding practices such as HTTP cookie respawning.

Applying a computer security rubric to the sum of this analysis yields qualitative results

about the nature of third parties. In particular, the tracking technologies and methods

employed by third parties are parallel to the actions of adversaries in a computer security

model. This suggests that third parties intentionally circumvent user efforts to control

privacy. In this case, third parties and third-party tracking should be considered a serious

privacy risk to users.

By applying my analysis to the FoxTracks tool, I was able to make the software conceptu-

ally true to its goals of accessibility, informational quality, and third-party tracking exposure.

However, there is substantial room for revision of the software and a need for interface test-

ing. Further research directions that would also enhance FoxTracks include an examination

of DOM Storage contents or a focused study of Flash cookies. FoxTracks might separately

inspire similar privacy-enhancing tools that explore different web technologies or convey in-

formation in creative manners. Alternatively, work might be undertaken on a holistic survey

39

of security-enhancing technologies, especially as standards promote user self-management of

online privacy. Further research in any of these directions would be supported by FoxTracks

and the accompanying analysis, and would in turn further the tool’s goal of achieving a

privacy standard of informed consent.

40

Acknowledgements

I would like to express my sincere thanks to Professor John Mitchell and Professor David

Dill for their guidance on this research project.

My deepest thanks also goes to the team at CDT Labs who have advised me on technical

matters and provided online support for the project.

41

Bibliography

[1] DOM Storage. https://developer-stage.mozilla.org/en/DOM/Storage, April 2010.

[2] VideoEgg. http://www.crunchbase.com/company/videoegg, March 2010.

[3] M. Ackerman, Cranor L., and J. Reagle. Privacy in e-commerce: examining user sce-

narios and privacy preferences. Proceedings of the 1st ACM conference on Electronic

commerce, pages 1–8, May 1999.

[4] M. Chambers. Macromedia Flash MX Security. Macromedia whitepaper describing the

information accessible to LSOs., March 2002.

[5] R. Hof. Ad networks are transforming online advertising. BusinessWeek, March 2009.

[6] D. Kristol. HTTP cookies: Standards, privacy, and politics. ACM Transactions on

Internet Technology, 1(2):151–198, 2001.

[7] D. Kristol and L. Montulli. RFC 2109: HTTP state management mechanism. Internet

Engineering Task Force, Network Working Group, February 1997.

[8] W. Maes, T. Heyman, L. Desmet, and W. Joosen. Browser protection against cross-site

request forgery. Proceedings of the first ACM workshop on Secure execution of untrusted

code, pages 3–10, November 2009.

42

[9] Jonathan R. Mayer. “Any person... a pamphleteer”: Internet Anonymity in the Age of

Web 2.0. Woodrow Wilson School undergraduate thesis containing relevant survey data

about perceptions of third-party tracking on the web, April 2009.

[10] G. Nowak and J. Phelps. Understanding privacy concerns. an assessment of consumers’

information-related knowledge and beliefs. Journal of Direct Marketing, 6(4):28–39,

August 2006.

[11] W. Peng and J. Cisna. HTTP cookies a promising technology. Online Information

Review, 24(1):150–153, April 2000.

[12] B. Pfitzmann and M. Waidner. Privacy in browser-based attribute exchange. Proceedings

of the 1st ACM conference on Electronic commerce, 154(3):52–62, November 2002.

[13] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J. Hoofnagle. Flash Cookies and

Privacy. UC Berkeley survey of flash cookie adoption on the web and related privacy

concerns, August 2009.

[14] Douglas R. Stinson. Cryptography Theory and Practice, pages 355–363. Chapman &

Hall/CRC, third edition, 2006.

[15] United Virtualities. United virtualities develops id backup to cookies, browser-based

persistent identification element will also restore erased cookie. March 2005.

43