36
1 Technical Issues Concerning The Use Of Personal Data On The Internet Brian Kelly Email UK Web Focus [email protected] UKOLN URL University of Bath http://www.ukoln.ac.uk/ Bath, BA2 7AY UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.

1 Technical Issues Concerning The Use Of Personal Data On The Internet Brian Kelly Email UK Web Focus [email protected] UKOLN URL University of Bath

Embed Size (px)

Citation preview

1

Technical Issues Concerning The Use Of Personal Data

On The InternetBrian Kelly Email

UK Web Focus [email protected]

UKOLN URL

University of Bath http://www.ukoln.ac.uk/

Bath, BA2 7AY

UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.

UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.

2

Contents

About UK Web Focus

Personal Information and the Internet• End User issues• Information Provider issues• System Administrator issues• Management Issues

Solutions• Technical• Protocol developments• Organisational

Conclusions

3

About UK Web Focus

UK Web Focus:• Three year post funded by JISC• Provides advice and support to the UK HE

community on Web matters• Activities:

– Monitoring web developments– Talks and presentations (e.g. Technical Threats to

Copyright and IPR at Talisman seminar on Legal Risks on the Internet in January 1998)

– Represent JISC on World Wide Web Consortium (W3C)– Other related activities

4

Personal Data and The InternetWhat are the issues peculiar to the Internet?

• Junk email• Big brother• Searching

• Log files• Preventing

misuse• Liability• Central Policies vs.

Departmental action• Student use• Confidentiality

• Ease of use• Ability to reuse

dataEnd Users

Information Providers

Systems Managers

Management

What Else?

5

End User

What are the privacy implications for end users of Internet services:

• A user of email or Usenet News• A student who uses a public access

PC to access Web resources• A member of staff who uses a PC in

his office

6

Mailing Lists and Usenet

Mailing ListsMailbase, for example, provides search facilities for finding:

• Membership of lists• Details of postings

Usenet NewsUsenet News articles:

• Are archived• Can be searched

http://www.mailbase.ac.uk/search.html

http://www.altavista.digital.com/

7

Institutional Mailing Lists

• Many institutions use the HyperMail software to archive internal mailing lists

• Robot software can index these archives.

Using ACDC to search for "Brian Kelly" reveals contributions to mailing lists.

8

UK Directory Services

Many Universities run an X.500 directory service

• X.500 is a distributed directory protocol

• Originally dedicated clients were used to access X.500

• Now it's much easier using the Web

http://www.brunel.ac.uk/x500/search-form-gb.html

9

Finding People

Various other directory searching services are available:

Whowhere<URL: http://www.whowhere.com/>

BigFoot<URL: http://www.bigfoot.com/>

IAF (Internet Address Find)<URL: http://www.iaf.net/>

Advertising revenue can make these a commercial proposition

10

Ahoy!

Ahoy! is a research project which uses AI techniques to find (a small number of) personal home pages

AI techniques will make it easier to find personal information

http://www.ahoy.cs.washington.edu:6060/

11

Web Browsers and PrivacyClient Caches

Web browsers store viewed resources in a local cache (on hard disk on network drive).

These resources can be re-used.

Potentially these files could be accessed by other users of PC or a system administrator

12

Web Browsers and Privacy

Cookies

Cookies enable information to be stored on your local PC which can be reused by the remote server.

Cookies are useful in applications, such as "shopping baskets", CBL, etc.However there are privacy implications, since cookies can be used to record paths through a website.

13

Information Providers

What personal information is provided on the web?

Corporate Information

Individual /Societies

14

Changing Context

Technologies such as Frames can change the context of resources on the web by:• Pointing to text• Pointing to graphics

There has reportedly been a "Babes on the Web" page. Document held remotely

15

Web Forms

Web forms are now trivial to set up Save time and effort Information may be

reused easily Are information

providers aware of implications of reusing information?

16

System Administrators

System Administrators can:• Read incoming and outgoing

messages and Usenet postings

• Analyse cache log files to find popular websites - and potentially who's been accessing them

• Deny access to specified websites

• Publish statistics on hits to pages

17

Web Statistics

Many web administrators publish their web statistics:

• Access by country• Access by domain

name• Most popular

pages

18

Restricting Access

It is possible to restrict access to sites containing dubious content

It is also possible to record email address and take action if persistent access attempted

Is this:• Sensible action • Breach of privacy?

19

Solutions

There are a variety of solutions to the issues concerned with Personal Data and the Internet:

• Don't use the Internet• Information providers' "tricks"• System administrators' "tricks"• Protocol Developments• Auditing

Education is important throughout

20

Solutions - Denying Access

• Information published on the web can be easily processed by robots

• Can prevent (well-behaved) robots from accessing resources using the Robot Exclusion Protocol (REP) (robots.txt file)

Alta Vista search for "Brian Kelly" gives 2,800 hitsBut:

• Not widely used: ~30% of UK universities• Not easily scaleable (single file at web root)

User-agent: *disallow: /stats/

21

Solutions - For Info Providers

• REP implemented by system administrator• Possible (but not easy?) to create master robot.txt file by merging departmental ones

• HTML 4.0 <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> element enables individual files to contain robot directives New and not yet widely supported

• Since robots tend not to follow CGI programs, could hide information behind a button Not elegant

Conference Details

Participants

Campus map

<FORM ACTION="part.html"><INPUT TYPE="submit" VALUE="...">

22

Preventing Misuse

There are technical ways of:• Preventing resources from being used in

frames • Preventing images from being "stolen"

Solutions are being considered mainly for copyright protection

However such solutions aren't widely deployed as:• They may prevent the resource from being

reused in valid ways• No user / political pressure?

23

Political Developments

Global Information Networks

• European Conference in Bonn in June 97• Raised issues of:

– Data protection– Technological solutions

• See <URL: http://www2.echo.lu/bonn/conference.html>

24

W3C Response

World Wide Web Consortium (W3C) responded to Bonn paper:

• Summarised technological solutions:– DSig: a web of trust– PICS: content selection without censorship– P3P: privacy project– IPR: intellectual property rights

• See <URL: http://www.w3.org/TR/NOTE-eu-conf-970711>

25

DSig

DSig:• W3C's Digital Signature Initiative• Helps users to decide who to trust• Based on digitally signed assertions:

"This web page comes from Bath University Courses office and gives a legally binding list of courses"

• See <URL: http://www.w3.org/Security/DSig/Activity.html>

26

PICS

PICS:• Platform for Internet Content Selection• Mechanism for rating web pages

e.g. X, A, PG, U

• Decision to accept resource made by end user (or end user organisation)

• Choice devolved - no censorship of originating resource

• See <URL: http://www.w3.org/PICS/>

27

IPR

W3C's IPR activity:• Intellectual Property Rights and the Web:

– Does use of a cache infringe copyright– Can links to resources be made freely– …

• Asks the contentious question:Does the nature of the technology require us to change the legal understanding or status of copyright as it stands now?

• See <URL: http://www.w3.org/IPR/Activity.html>

28

P3P

P3P:• Platform for Privacy Preferences

• Will develop specification and demonstration of way of expressing privacy practices and preferences by Web sites and users

• Architecture and grammar work complete (Oct 1997)

• See <URL: http://www.w3.org/Privacy/Activity.html>

29

P3P Deliverables

General Overview of the P3P Architecture• Document describes the P3P model

Grammatical Model

• Grammar and vocabulary for machine-readable statements:

Data Categories: e.g. name, email, ...

Practices: Use: e.g. system admin, research, customisation

Transfer: divulge information within organisation

Release: divulge info to other organisation

Access: ability of data subject to view information

See <URL: http://www.w3.org/TR/NOTE-IPWG-Practices.html>

30

JTAP Calls

Digital SignaturesStudies to identify appropriate protocols and to test deployment. Seeking to fund an overview report and a technology deployment pilot

Certificate Based Infrastructure ServicesTechnical overview and pilot. Seeking to fund an overview and technology watch project at a cost of £25,000, followed by one or two deployment pilots

Work to start in Dec 1998

See <URL: http://www.jtap.ac.uk/bid/c14_98.html>

31

Privacy Services

TRUSTe:• An "independent, non-profit, privacy initiative

dedicated to building users' trust .. on the Internet"

• TRUSTe sites agree to:– Maintain an approved Privacy Statement

– Explain information gathering practices:

– What personal information will be used for

– Whether information will be disclosed

– Display the TRUSTe's Mark

• TRUSTe will periodically check conformance

• See <URL: http://www.etrust.org/>

32

What's Happening in UK?

Number of universities have provided guidelines governing Internet use:

• Data Protection• Computer Misuse• ..

But:• Is work being duplicated?• Is it still relevant?

http://www.cam.ac.uk/CS/DPA.html

33

What's Needed? Auditing Software WebWatch

• Project based at UKOLN• Monitors web technologies (not content)• Potential for auditing robots.txt files?

Do we want software for auditing at a national or institutional level?

Can we follow the TRUSTe model?

34

What's Needed? Catalogue of GuidelinesA catalogue of UK HE web resources is being produced:

• Uses ROADS (cf. SOSIG, OMNI, etc.)

• Various categories planned:

– AUP– Guidelines for authors– Local search engines

• Feedback welcome

35

What's Needed?EducationNeed for education for:

• End users• Information providers• System administrators• Managers

Who provides training materials?

Who delivers the training?

36

Conclusions

• Widespread use of the Internet / ease of publishing has increased privacy concerns

• Need for education and awareness:– End users– Information providers– System administrators (central & departmental)

• Do we want a system like TRUSTe? • Need for auditing tools locally / nationally?• Need to share experiences• Need to be aware of (implement?)

technical solutions