36
Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Exploring the Internet

91.113-001

Instructor: Michael Krolak

Authors: P. D. & M. S. Krolak Copyright 2005

Page 2: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Blog of the Week

“Does my computer really know me? What an interesting thought. Could my computer come alive? Does it really know what I am thinking or how I am feeling? Sherry Turkles' article "Who Am We" made me stop to ponder for a bit.

Sherry Turkle writes:

"Granting a psychology to computers can mean that objects in the category 'machine,' like objects in the categories 'people' and 'pets,' are fitting partners for dialog and relationship. Although children increasingly regard computers as mere machines, they are also increasingly likely to attribute qualities to them that undermine the machine/person distinction."

This passage really struck a cord with me. Before this article, I always thought about my computer as a machine with no real function other than to provide me with information or maintain data as I needed. Sherry Turkle change that ever so slightly. The computer is after just a box with electrical components and wires. It needs no life sustaining oxygen or food to survive.

The gaming lifestyle is a point that I could relate to. Having a teenage daughter at home, she too plays Sims. Although I never asked her if she imagined herself in those roles that she creates. She did mold her characters after her and her friends. Often stopping to let me know that this was her friend so and so or that was what her husband was going to be like. At the age of 13 we bought Sims for her. She has since added several updates or expansion packs for the original game.”

Page 3: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Blog of the Week (cont.)

“She still plays the game but not like she did in the past. She has returned to the real world. She does not proclaim the figures in the game as being figures of herself or friends. I asked her if she still creates figures of someone she would like to be a she told that was crazy. "Dad it's like not real life, It's like only a game, be real Dad." then I get the look... If you are a parent you know the one, the get a life Dad look.

The so called MUDers are a totally different breed. This how I see anyway. They probably do have some identity or self esteem issues. I have fiends who are very big into computers. And recall them talking about these "MUDs" or rooms as my called them. This was when computers were first being introduced. I was amazed at what they would do on the computer. I was also confused about the whole secret identity they would maintain. One friend, I will call him Fred, would be in his MUD for hours. I would ask him what he was doing he tried to explain it to me to no avail. I just did not comprehend the whole idea of the room lifestyle. I never thought about it again till now. We are going back about eighteen years now. Man am I old.

Although Fred never took to as far an extreme as some of the folks in the article, he did get consumed by his personae in the room.

Humans are a very strange species for sure. Some of the people from the article may have some serious identity issues for sure. Perhaps maybe they just want to be in a different place or a different time. I am not sure why they do what they do, but all the power to them if they can truly differentiate between their multiple personae. Heaven help them if they cannot. “

Page 4: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Search for Information on the Web

Finding information on the web requires some concepts of how the various types search engines work.

Archives that capture the changes in the documents on the web are highly useful for those in the social sciences, technology, and business dynamics.

“Intelligence is not the ability to store information, but to know where to find it.“

- Albert Einstein

Page 5: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

How do we find information?

• Memory

• Media – Books– Movies– Music– Art

• Observe

• Ask other people

Page 6: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

The Problem with the Internet

• The “Surface Web” contains 2.5 Billion pages.• Each day 7.5 million web pages are added to the World Wide Web• Information is submitted to the web without any context or test of

validity

Page 7: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

The Archives of the Web

1. Archival of the Web’s websites

2. Google’s archive of the Internet newsgroups.

Page 8: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

The Way Back Machine

• Frustrated by dead links – there is an answer. The WayBack Machine at http://www.archive.org/

• Just fill in the URL of the dead link and the links history will give the history of the link (how the page changed over time) and allow you to view the dead link.

Page 9: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Google’s Newsgroup archivehttp://groups.google.com/

• Archives over 100,000 groups• Goes back for some groups over 30 years.• There are for fee sites that provide competitive services.• Depending on the group it can provide a treasure trove

of insight into the cyber information society and it early history.

• Not all messages in the database are true, have merit or redeeming value, or are appropriate for children.

Page 10: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Searching for Information on the web

1. Search engines

2. Meta-Search engines

Page 11: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What is a Search Engine?

search enginen. 1. A software program that searches a database and gathers and reports information that contains or is related to specified terms.

2. A website whose primary function is providing a search engine for gathering and reporting information available on the Internet or a portion of the Internet.

Source: The American Heritage® DictionaryCopyright © 2002, 2001, 1995 by Houghton Mifflin Company. Published by Houghton Mifflin Company.

Page 12: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Search Engines

Search engines have two parts:

1. The search sends out onto the Internet a software called a spider or bot (robot).

• Traces all the links and returns all the pages found.• The pages are characterized by algorithms and stored in

databases

2. The retrieval system that takes a query and maps against the databases.

• The retrieval rank orders the responses by relevance• Each search engine uses a unique technique for retrieval and

ranking.

Page 14: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What are Meta Tags

meta tags

n.

1. Attributes that describe information about the content of the document. Some spiders use these tags to determine the relevance of a site to future queries.

Example

<META NAME="keywords" CONTENT=“red sox world champions schilling manny damon">

Page 15: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

How do search engines work?

Page 16: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Meta Search Engines

• Meta search engines are search engines that use their own resources for answering the question

• but they mostly form the query from the user input and package it and send it off  to many other search engines simultaneously (the process is called spawning) and then wait until the replies come back.

• After a fixed time the meta takes the responses received and pulls them together into a report.

• There are many ways to create a meta search based on the idea. Some allow you to search only the web, others newsgroups, newspapers, and scientific journal.

Page 17: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Why is an understanding of how a search engine works important?

• From the view of a user:– The user wants to find the information with as few downloads

as possible. – The easier to use and the more accurate the ranking the better.

• From the view of a web site developer:– The developer wants the site to found by in the first 5-10

ranked responses to a query.– The merit of a web design is often based on the search

rankings. This requires a knowledge of a given search engine ranks a page.

Page 18: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

When in doubt ask a librarian:

• The librarian is a trained professional and are well versed in using the various WWW resources for finding answers to a vast array of subjects.

• The librarian should be used for difficult searches; but the student will wisely observe, learn, and contemplate the librarian's techniques, resources, and methods.

Page 19: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What is a Subject Directory?

subject directoryn. 1. An Internet research tool on the World Wide Web that organizes Internet resources by subject headings and subheadings. Subject directories are usually compiled by human beings who apply some selection criteria to resources included in the database.

Page 20: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Examples of Subject Directories

• www.yahoo.com Yahoo!

• http://bubl.ac.uk/ BUBL

• http://www.ipl.org/ Internet Public Library

• www.about.com About.com

• www.jumpcity.com Jump City

• http://www.joeant.com/ Joe Ant

Page 21: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What is a Meta Search Engine?

search enginen. 1. Meta search engines are search engines that use their own database as well as sending the query to many other search engines simultaneously (called spawning) and report the unique responses from other search engines.2. Meta search engines that are limited to only the web, newsgroups, newspapers, and scientific journals.

Page 22: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Examples of Meta Search Engines

• Ask Jeeves -- frequently get the answer in the first pass. Jeeves allows queries in natural language.

• Dogpile  -- for its variety of sources (web, newsgroups, newspapers)

• Ixquick

• Metacrawler

• ProFusion

Page 23: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

The Deep Web

Page 24: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What is the invisible or deep web

• Invisible Web (n.) Also referred to as the deep Web, the term refers to either Web pages that cannot be indexed by a typical search engine or Web pages that a search engine purposely does not index, rendering the data “invisible” to the general user. One of the most common reasons that a Web site’s content is not indexed is because of the site’s use of dynamic databases, which opens the door for a potential spider trap. Web pages can also fall into the invisible Web if there are no links leading to them, since search engine spiders typically crawl through links that lead them from one destination to another. Data on the invisible Web is not inaccessible; the information is out there—it is stored on a Web server somewhere and can be accessed using a browser—but the data must be found using means other than the general-purpose search engines, such as Google and Yahoo!.

Source: http://www.webopedia.com/TERM/I/invisible_Web.html

Page 25: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

The deep web

• The deep web is not mysterious, it simply means that normal search engines that use spiders that go from one link to another will not work with pages that are generated on the fly from data requested from a database, or not linked to other data, etc.

• Example of a deep website are the yellow or white pages, catalogues, and patents.

• Google can index search pdf, text, and word documents

Page 26: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What is the Deep Web?

• Estimated to be 500 times (1.25 trillion web sites) the size of the surface web.

Page 27: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Using the Search tools to find information of the web

Page 28: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Successful searching

Plan your search:1. What are the words that will only be on the right

web page. Should they all be there or are there alternatives. The most specific concept is the best.

2. If you do not know your ideal topic well, use a meta search engine to get the smart. Then refine your search with a search engine like google or altavista.

3. Use a virtual library site to find information reviewed by experts if it is technical.

Page 29: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

What is Boolean Logic?

We use Boolean Logic to evaluate the truth of one or more propositions. There are three important operators: AND, OR, NOT

•AND – only true if A and B are both true.•OR - only true if either A or B is true.•NOT - only true when A is false.

When searching for information, we use Boolean logic to find results that are relevant to our search terms. If a web page is relevant to a search term, the search engine evaluates the page as true.

Page 30: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Examples of Searching with Boolean Logic

• Yankees and Choke– All web pages that contain the

terms Yankees and Choke.• Yankees or Choke

– All web pages that contain the word Yankees.

– All web pages that contain the word Choke

– All web pages that contain the terms Yankees and Choke

• Choke and not Yankees– All web pages that contain the

word Choke, but don’t contain the word Yankees

Page 31: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

More Advanced Usesof Boolean Logic

• If you are looking for a proper name, a phrase, or an other collection of words that normally are found together, then enclose them in double quotes, i.e. "President Gerald Ford".

• If the web page should have one or more words that must be on the page, then use the logical And, i.e. President And Ford And "United States".

• If the web page may have different forms of the name, or titles, etc. then use the logical Or, i.e. President Or "Vice President" Or Representative And "Gerald Ford".

• If document should exclude a word or phrase, then use the logical Not, i.e. "Gerald Ford" Not "Ford automotive" and Not "Ford car" and Not "Ford truck".

Page 32: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Other Helpful Hints

• While not Boolean logic, some search engines allow concepts like -- NEAR and FOLLOWED BY are also allowed, to indicate the relationship of the words or phrases other words and phrases. Normally these relations can be which comes first or whether the word is within a certain number of words to the first word. This concept is called proximity logic.

• Not all search engines use the  AND, OR, NOT notation some like Alta Vista use " +"  for AND and "-" for NOT.

Page 33: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Tips for Using Search Engines

• When searching for a large scale database, it is important to be extremely precise.

• Avoid using vague or common words that will only produce millions of pages.

• Read the instructions for each new search engine you use. There are many different methods of searching between the search engines and subject directories.

Page 34: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Finding Audio and Video

• http://www.alltheweb.com/ video, audio, news• http://images.google.com – Good source of

images• www.dogpile.com – One of the few search

engines that provides searches for video.• www.fazzle.com – Provides limited video and

image searching capabilities• http://video.google.com/ -- A new beta product

may have bugs.• http://www.tssphoto.com/ Stock photo images

Page 35: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Dogpile for finding non-text based files

The number of sites that allow so called “anonyms or guest” ftp directories is now greatly diminished. Due to security considerations most sites do not have non-text directories that are open to search and file download. Dogpile still allows search for images, audio, and videos.

Page 36: Exploring the Internet 91.113-001 Instructor: Michael Krolak Authors: P. D. & M. S. Krolak Copyright 2005

Lab Exercise

1. Tell me in 100 words or less what the Flying Spaghetti Monster

2. Show me a video of the Spaghetti Harvest

3. Show me an image of an atoll

4. Show me the headline image on the boston.com website from Oct 21, 2004