8
A Potted View of the Web Who, where, when, why. Approximately. Lee Gillam Department of Computing

A Potted View of the Web Who, where, when, why. Approximately. Lee Gillam Department of Computing

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

A Potted View of the Web

Who, where, when, why.

Approximately.Lee Gillam

Department of Computing

• ARPANET: c1967, decentralized computer network (US DoD) => INTERNET: c1983, relying on TCP/IP as a means to split data into packets and route them to computers

• Email: c1972, software for ARPANET

• Personal Computer (affordable?): c1980, Apple Lisa/Mac and IBM PC

• SGML: c1970; Hypertext: c1987, Apple’s Hypercard

• Web Browser: c1990, Tim Berners-Lee then at CERN, the European Organization for Nuclear Research

– Hypertext + Internet + PC to produce an information network enabling physicists are CERN to share experimental results

– First webpage: http://info.cern.ch/hypertext/WWW/TheProject.html,

Key Milestones - some developments

• Browsers - TBL Browser called “WorldWideWeb”; Mosaic: 1993; Opera: 1994; Mozilla (Netscape): 1994; Internet Explorer: 1995 ….. IE, Firefox, Safari, ….

• Web Servers - Apache, IIS, …..

• Search Engines - AltaVista, Google, MSN, …..

• Estimated size of the web: 125m sites (July 2007); 88m (July 2006). Google: ~5bn pages (+/- 14bn) in English. 2001, English about 68%; 35% by 2004.

• What else? Video, Audio, IM, ….

• What next?

– Semantic Web - the machine-understandable web (TBL)

Key Milestones - some developments

• Using the Domain Name Servers (DNS)

– A distributed database that converts names to addresses. Overall management by the Internet Corporation for Assigned Names and Numbers (ICANN). US, 1998.

–http://www.bbc.co.uk (= http://www.bbc.tv) => http://212.58.224.116 (http://212.58.224.116:80/)

– Hypertext Transfer Protocol (HTTP): for moving information (HTML) around the web.

– HTTP client establishes TCP connection to a port (usually 80) [Email is port 25]; HTTP server (program) responds with messages (e.g. 404 error) and data (HTML).

– Country code top level domains (ccTLD): Nominet for .uk (.gb); .tv = Tuvalu, an island in the

Pacific. .com, .net, .eu, .travel, .org …. (gTLD)

How does it work?

How does it work?

• Think about a distributed telephone conversation

26

26 what?

26 days

Full days or working days?

• What do we assume?

How does it work?

<p>

<p>?

??

<para>

An unknown or unexpected error

has occurred.

• Communication and commonality (standards) are vital (TCP/IP, HTTP, HTML….) ….. Machines are dumb ….. The Browser still has to present the (represented) information somehow.

How does it work?

• HTML: Elements, Attributes and Values

<a href=“http://www.cs.surrey.ac.uk”>Surrey Computing</a>

– Elements delimit: e.g. <a>, </a>

– Elements are additionally specified by attributes, e.g. href

– Values fill elements or attributes, e.g. http://www.cs.surrey.ac.uk and Surrey Computing

– Should have closure for elements (corresponding </ >) - browsers can be forgiving

How does it work?

• HTML: Elements, Attributes and Values

– Elements and values are embedded, e.g.

<h1>

Visit the

<a

href=“http://www.cs.surrey.ac.uk”>

Surrey Computing

</a>

Website

</h1>

– 1[h1 2[text, a 3[@href 4[text]4, text]3, text]2] 1.