Big Data - Security Concerns

Gigabytes gone wild

If we don’t balance the human values that we care about with the compelling uses of

big data, our society risks abandoning them for the sake of mere innovation or

expediency.

Ben Torres/Bloomberg via Getty Images

These days, everyone seems to be talking about “big data.” Engineers, researchers,

lawyers, executives and self-trackers all tout the surprising insights they can get from

applying math to large data sets. The rhetoric of big data is often overblown,

exaggerated and contradictory[1], but there’s an element of truth to the claim that data

science is helping us to know more about our world, our society and ourselves.

Data scientists use big data to deliver personalized ads to Internet users, to make better

spell checkers and search engines, to predict weather patterns, perform medical

research, learn about customers, set prices and plan traffic flow patterns. Big data can

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2325537

also fight crime, whether through the use of automated license-plate readers or, at

least theoretically, through the collection of vast amounts of “metadata” about our

communications and associations by the National Security Agency.

Big data allows us to know more, to predict and to influence others. This is its power,

but it’s also its danger. The entities that can harness the power of math applied to large

sets of personal information can do things that used to be impossible. Many of these

new uses are good, but some of them aren’t. For example, if our “personalized prices”

can be based on our race or sex, or if our college admissions are based on things like ZIP

code or car ownership, we might want to think more deeply about the kinds of big

decisions our big data can be used for. We’re creating a society based on data, and we

need to make sure that we create a society that we want to live in.

The values we build or fail to build into our new digital structures will define us. If we

don’t balance the human values that we care about — such as privacy, confidentiality,

transparency, identity and free choice — with the compelling uses of big data, our

society risks abandoning them for the sake of mere innovation or expediency.

We think the answer lies in a conversation about the ethics of big data. What should we

allow it to do for us, and why? Big data has allowed the impossible to become possible,

and it has outpaced our legal system’s ability to control it. This is understandable, as

our elected officials don’t pass laws to regulate things that aren’t possible. We need to

talk about big data ethics, and we think four facts should guide our discussion.

Big data ethics

First, when we talk about decisions based upon personal data, we need to realize that

privacy rules are necessary. Some people might argue that privacy is dead in an age of

information, but nothing could be further from the truth. Privacy isn’t just about

keeping things hidden, it’s about the rules we use to govern information. Look at the

“privacy policies” of even big data companies — these tell you not just what

information gets collected about you, but how it is used and when it can be destroyed.

Second, we need to realize that even shared personal information can be protected.

When you go to see doctors or lawyers, you don’t expect that the information you give

them is theirs to use any way they want. The information is confidential: We confide in

them so they can help us, and it’s the promise of confidentiality that lets us trust them

enough to tell them everything they need to know, even if it’s embarrassing or

sensitive. This essential trust is backed up by laws as well as professional rules of

ethics. We don’t think of this information as “public” or “nonprivate,” and we can

think about much of the data gathered about us the same way, whether it’s the

websites we visit, the books we read or the places we go that our digital devices track

automatically. Amazon, Apple or our ISP or mobile phone carrier might need to know

this information to help us go about our days, but that doesn’t mean this data is

“public” or that it should be beyond our control.

If we’re constantly sorted and nudged by big-data-based decisions, we risk letting the

powerful entities in our lives determine who we are before we even know ourselves.

Third, big data requires transparency. If important decisions are being made about us

based on an algorithm and data, we have a right to know how the algorithm works and

what data is being used. It’s outrageous that while big data has allegedly eliminated

privacy, many of the ways it’s used are themselves shrouded in secrecy. This has things

entirely the wrong way around. If we’re to build a society through decisions based upon

data, we should know how they work, especially when those decisions will affect our

daily lives, privacy and social opportunities.

Finally, we should recognize that big data can compromise identity, our right to decide

who we are. If we’re constantly sorted and nudged by big-data-based decisions in areas

from our choice of books to our voting habits, we risk letting the powerful entities in

our lives determine who we are before we even know ourselves. We need to think

imaginatively about the kinds of data inferences and data decisions we will allow. We

must regulate or prohibit ones we find corrosive, threatening or offensive, just as we’ve

long protected decisions surrounding voting and contraception and prohibited

invidious decisions made upon criteria such as race, sex or gender.

A new framework

How should we make sure that big data ethics gets built into our digital future? Law

should certainly be part of the answer, and despite the claims of some technologists,

law can work here. For example, the federal Fair Credit Reporting Act effectively

regulates the credit reporting agencies’ use of big data to generate consumer credit

reports and calculate consumer credit scores. In fact, the FCRA has regulated[2] growing

uses of big data in this context since 1970. (Some kinds of big data are really old.) As big

http://www.futureofprivacy.org/wp-content/uploads/LEGAL-Hoofnagle-How-FCRA-Regulates-Big-Data.pdf

data’s analytical tools become more common in our society, we should extend similar

legal protections to other essential areas as well.

But law alone cannot solve these problems. As a society, we need to talk about big data

ethics. Are we comfortable using race or proxies for race to price goods or allocate

government benefits such as school funding or welfare payments? What about using

big data inferences to decide college admissions or lawsuits, to investigate crimes or

impose criminal sentences? As scholars, we certainly have our own moral views on

these questions (as do many data scientists), but if we’re building a society in which

data science is deployed more often, we need to talk as a society about what we will

allow and what we won’t. In this respect, the White House’s initiative[3] to study the

technological, legal and ethical implications of big data is a good first step. But we need

to do more.

We need to establish social norms for the use of data to make decisions about people,

and for the rights that people have for understanding and disputing those decisions,

just as we established norms for safe working conditions in the wake of the Industrial

Revolution and norms for the allocation of government services and benefits at the

dawn of the welfare state. When we do this, software designers and engineers need to

be at the center of the conversation. Individual users certainly have responsibility to

behave responsibly when their data is at stake, but users alone can’t bear the whole

burden. We need to build structures that encourage ethical data usage rather than

merely incentivizing individual consumers into sharing as much as possible for as little

as possible in return.

We must build these structures, such as in-house ethicists or review boards, into

government and private entities that use big data. Such proposals might seem far-

fetched, but they are already starting to become widespread. For decades, university

scientists wishing to perform experiments (whether physical or data-based) on human

subjects have had to submit their research projects to institutional review boards[4], in-

house panels that ensure that scientific tools are deployed ethically and for the benefit

of human beings. And many leading corporations have started to take steps along these

lines, such as the widespread growth of chief privacy officers as senior corporate

executives or experiments such as Google’s ethical review board or ethicists-in-

residence. If we’re building a data-science revolution, let’s make sure it’s a revolution

we want — one that makes society better as well as making companies richer.

http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal

http://www.stanfordlawreview.org/online/privacy-and-big-data/consumer-subject-review-boards

1. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2325537

2. http://www.futureofprivacy.org/wp-content/uploads/LEGAL-Hoofnagle-How-FCRA-Regulates-

Big-Data.pdf

3. http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal

4. http://www.stanfordlawreview.org/online/privacy-and-big-data/consumer-subject-review-boards



Big data ethics first begins as a state of mind, before it becomes a set of mandates.

While engineers in particular must embrace the idea of big data ethics, in an

information society that cares about privacy, confidentiality, transparency and identity,

we must all be part of the conversation, and part of the solution. Big data ethics is for

everyone.

This op-ed is adapted from "Big Data Ethics[5]," an essay by Neil M. Richards and Jonathan

H. King in the Wake Forest Law Review (forthcoming 2014). You can access the paper online

here[6].



Technology

Big Data - Security Concerns