Upload
elvin-hines
View
216
Download
0
Embed Size (px)
Citation preview
In Search of Truth on the World Wide Web
Fluency with Information Technology
2012-01-25 Katherine Deibel, Fluency in Information Technology 1
INFO100 and CSE100
Katherine Deibel
The Key to an Information Society
Search allows us to manage large amounts of information
Largest area of research within Artificial Intelligence
2012-01-25 Katherine Deibel, Fluency in Information Technology 2
Finding true information in the library is easier than finding it on the Web because
A. Librarians usually choose only authoritative sources.
B. The truth of information on paper pages is higher than on electronic pages.
C. Most books in libraries are old, and people used to be more truthful in the past.
D. None of these choices.
2012-01-25 Katherine Deibel, Fluency in Information Technology 3
Librarians choose what is available in the library, and they try to get the best information possible.
Libraries pre-vet the information for you.
2012-01-25 Katherine Deibel, Fluency in Information Technology 4
Books have limited research value because they
A. Take a long time to produce
B. Contain only information the author selects
C. Contain only a single point of view
D. All of the above
Of course, books are note bad source of information, but these factors must be taken into account.
History: Book Indices
How is a book index made? Not just a list of all words in the book
(that is called a concordance)
Human-led process
Subjective selection of key terms and locations in the text
Cross-referencing of related terms to aid the reader
2012-01-25 Katherine Deibel, Fluency in Information Technology 5
Book Index and Search
Take the textbook Look up fluency in the index
How many pages does it list? Imagine a digital version of the book
How many times will a search find the word 'fluency'?
2012-01-25 Katherine Deibel, Fluency in Information Technology 6
Discuss with your neighbor
Electronic books are common, and many e-readers provide search functionality. Which is more helpful: the human-
created index or the search engine? Does it differ by task? How so?
2012-01-25 Katherine Deibel, Fluency in Information Technology 7
Lesson
Search is a Tool not a Solution!
2012-01-25 Katherine Deibel, Fluency in Information Technology 8
Searching WiselyMaking Google, Bing, etc. work for you
2012-01-25 Katherine Deibel, Fluency in Information Technology 9
Looking In the Right Place
Google and other search engines are not always the first place to look!
You might be able to guess the site and the URL you need Need tax information: irs.gov
Need spelling help: dictionary.com
When is "Leverage" on: tnt.com
Ooops… that should be tnt.tv
2012-01-25 Katherine Deibel, Fluency in Information Technology 10
For a word to be hit in a Web search such as Google, the word
A. Must be used on the hit page.
B. Must be used on the hit page more than once.
C. Could be on another page as anchor text in the link to the hitting page.
D. None of these choices.
The word must be descriptive of a page meaning that it is:• used on the page • in the URL for the
page • used in the anchor
text for a link to the page.
2012-01-25 Katherine Deibel, Fluency in Information Technology 12
Page rank on a search engine is usually based on popularity.
A search engine has two parts:• The Web crawler• The query processor
How many major parts does a search engine have?
A. oneB. twoC. moreD. don’t know
2012-01-25 Katherine Deibel, Fluency in Information Technology 13
The query processor _______.
A.Checks search terms against the database of web pages, or Index
B.Cleans up the search termsC.Follows links to Web pages
to find matching termsD.1 and 2E. 2 and 3F. Don't know
2012-01-25 Katherine Deibel, Fluency in Information Technology 14
Search Engines
No one controls what’s published on the WWW ... it is totally decentralized
To find out, search engines crawl the Web Two parts
A crawler visits Web pages building an index of the content (stored in a database)
A query processor checks user requests against the index, reports on known pages
2012-01-25 Katherine Deibel, Fluency in Information Technology 15
Only a fraction of the Web’s content is crawled
The Index
Constantly updated Huge!
http://www.worldwidewebsize.com/ Google's index is currently averaging
49 billion web pages!
2012-01-25 Katherine Deibel, Fluency in Information Technology 16
The Truth about the Index
It's not just one index
2012-01-25 Katherine Deibel, Fluency in Information Technology 17
Alice Bob
Google Index
The Illusion
The Truth about the Index
It's not just one index
2012-01-25 Katherine Deibel, Fluency in Information Technology 18
Alice Bob
The RealityGoogle IndexGoogle
Index
The Truth about the Index
It's not just one index The large database and high demand
requires a distributed approach Separate servers run different
versions of the index Each server is updated at different
times and rates
2012-01-25 Katherine Deibel, Fluency in Information Technology 19
Alice and Bob's Results
Their search results will differ slightly First page of results will likely be the same
2012-01-25 Katherine Deibel, Fluency in Information Technology 20
Alice Bob
The RealityGoogle IndexGoogle
Index
Lessons
Search results constantly change Bookmark the sites you find... They
might drop down in the results Collaborative searching can be tricky
2012-01-25 Katherine Deibel, Fluency in Information Technology 21
Boolean Queries
Search Engine terms are independent
Words don’t have to occur together Use Boolean queries and quotes
Logical Operators: AND, OR, NOT
monet AND water AND lilies
“van gogh” OR gauguin
vermeer AND girl AND NOT pearl
Search for Mona Lisa
2012-01-25 Katherine Deibel, Fluency in Information Technology 22
AND is the default
Most search engines will return pages containing ALL of your search terms Too many words in a search could hurt
The OR operator is helpful here IBM stock prices 2005 OR 2006 OR 2007
2012-01-25 Katherine Deibel, Fluency in Information Technology 23
Google Advanced
2012-01-25 Katherine Deibel, Fluency in Information Technology 24
Search Strategies
Limit by top level domains or format Find terms most specific to topic Look elsewhere for key words, e.g. bio Use exact phrase only when universal If too many hits, re-query
Add another search term
Decide if you want an AND or OR
Try quotes around paired words
2012-01-25 Katherine Deibel, Fluency in Information Technology 25
Terms most specific to topic
The key to good research My dissertation example
I kept looking for: assistive technology rejection
The research terms are:abandonment or discontinuance
Went from finding <10 papers to >100 papers
2012-01-25 Katherine Deibel, Fluency in Information Technology 26
Search Engine Tricks
These apply to Google but many search engines have similar features "several words"
search for words in that order "word"
search only for that word and not any synonyms, plurals, etc.
-wordexclude word from the search
site:urlsearch only on a specific site
~wordinclude synonyms of word in the search
2012-01-25 Katherine Deibel, Fluency in Information Technology 27
Further Google Tricks
define word 67 to hex 1072 * 35 150 GBP in USD Do a barrel roll
2012-01-25 Katherine Deibel, Fluency in Information Technology 28
Judging CredibilityWhen Finding is not Enough
2012-01-25 Katherine Deibel, Fluency in Information Technology 29
Much of the Information on the Web is
A. Wrong.B. Correct.C. Pictures of cats.D. Pornography.E. Meaningless chatter.F. Of varying usefulness
and credibility
2012-01-25 Katherine Deibel, Fluency in Information Technology 30
You've heard it before…
Accessing information from the web is easy
But you must be careful Anyone can post anything
It is easy to fake authority
Don't use Wikipedia
2012-01-25 Katherine Deibel, Fluency in Information Technology 31
Let's take a different look
The Web is a common source of information
The problem with misleading information is in people accepting it at face value
2012-01-25 Katherine Deibel, Fluency in Information Technology 32
Writing Guides: The Checklist
Authorship Is there an author? You may need to… Can you tell whether the author is knowledgeable and credible?
If the author's qualifications aren't listed…
Sponsorship What does the URL tell you? The URL ending often specifies the
type of group hosting the site: commercial (.com), educational (.edu), nonprofit (.org), …
Currency How current is the site? How current are the site's links? If many of the links no longer
work, the site may be too dated for your purposes.
Excerpt from Hacker’s A Pocket Manual of Style (2008)
Pitfalls of the ChecklistAuthorship Is there an author? You may need to… Can you tell whether the author is
knowledgeable and credible? If the author's qualifications aren't listed…
Sponsorship What does the URL tell you? The URL
ending often specifies the type of group hosting the site: commercial (.com), educational (.edu), nonprofit (.org)…
Currency How current is the site? How current are the site's links? If many
of the links no longer work, the site may be too dated for your purposes.
INACCURATE: .org has never been restricted to only nonprofits
Pitfalls of the ChecklistAuthorship Is there an author? You may need to… Can you tell whether the author is
knowledgeable and credible? If the author's qualifications aren't listed…
Sponsorship What does the URL tell you? The URL
ending often specifies the type of group hosting the site: commercial (.com), educational (.edu), nonprofit (.org)…
Currency How current is the site? How current are the site's links? If many
of the links no longer work, the site may be too dated for your purposes.
Not all domains are regulated
Domains reflect only general purposes and not specific pages
Pitfalls of the ChecklistAuthorship Is there an author? You may need to… Can you tell whether the author is
knowledgeable and credible? If the author's qualifications aren't listed…
Sponsorship What does the URL tell you? The URL
ending often specifies the type of group hosting the site: commercial (.com), educational (.edu), nonprofit (.org)…
Currency How current is the site? How current are the site's links? If many
of the links no longer work, the site may be too dated for your purposes.
Ignores complexity of web authorship
Encourages the usage of titles, degrees, and symbols of authority to determine credibility
Pitfalls of the Checklist
Authorship Is there an author? You may need to… Can you tell whether the author is
knowledgeable and credible? If the author's qualifications aren't listed…
Sponsorship What does the URL tell you? The URL
ending often specifies the type of group hosting the site: commercial (.com), educational (.edu), nonprofit (.org)…
Currency How current is the site? How current are the site's links? If many
of the links no longer work, the site may be too dated for your purposes.
Suggests recent data as being more reliable
Update frequency will vary by the type of site
You need the population of Seattle, WA, in 1998. Where do you look up this information?
2012-01-25 Katherine Deibel, Fluency in Information Technology 38
A. WikipediaB. 2012 World's AlmanacC. 1998 World's AlmanacD. 1999 World's AlmanacE. A 1962 encyclopedia
Criticisms of the Checklist
Inherent problems Emphasis on surface features over content
Simplistic yes/no questions with no guidance
Erroneous indicators of credibility Students fail to develop information
literacy skills and critical practices Need for better evaluative methods to
develop sustained, transferable skills
A Different Approach
Determining usefulness and credibility is a process
Readers should engage in repeated dialogues with the document
The questions of usefulness and credibility vary by discipline
2012-01-25 Katherine Deibel, Fluency in Information Technology 40
Question
Q6C: The Evaluation Process
Categorize
Contextualize
Corroborate
Conclude
Characterize Authorship
Critique Rhetorically
Repeat as necessary
Developed by K. Deibel, S. Read, and T. Wright
Q6C: Question
Maintain a skeptical frame of mind Ask questions relevant to your
research
2012-01-25 Katherine Deibel, Fluency in Information Technology 42
Q6C: Categorize
In the context of your research, is this a primary, secondary, or tertiary source?
What type of site is it (website, blog, wiki, database, etc.)?
2012-01-25 Katherine Deibel, Fluency in Information Technology 43
Q6C: Critique Rhetorically
What do the authors’ choice of words, tone, font, display format, images, genre, and argumentative strategies tell you about the intended audience and the credibility and reliability of this site? (‘Read’ the site.)
2012-01-25 Katherine Deibel, Fluency in Information Technology 44
Q6C: Characterize Authorship
Identify who created the content, when they created it, and for what purpose.
Single or multiple authors? Committee? Institution? Critic? Expert? Unknown? Other?
2012-01-25 Katherine Deibel, Fluency in Information Technology 45
Q6C: Contextualize
Place the information collected in conversation with your existing experience and body of knowledge.
Does it fit? How?
2012-01-25 Katherine Deibel, Fluency in Information Technology 46
Q6C: Corroborate
Assess how the content compares to other sources.
Is the content consistent, complementary, or contradictory?
2012-01-25 Katherine Deibel, Fluency in Information Technology 47
Q6C: Conclude
How credible is the source? Is the source useful for your research
goals? If unsure, ask more questions. If the source is not credible or not
useful, find a new source and repeat the Q6C process
2012-01-25 Katherine Deibel, Fluency in Information Technology 48
Parting Thoughts on Chapter 6
Example of the fun of research and exploration of knowledge
Side quests are surprising What I learned from my dissertation
The popularity of eyeglasses in a Renaissance European nation
Ghoti is pronounced as 'fish'
2012-01-25 Katherine Deibel, Fluency in Information Technology 49
Always ask these questions
When working on a task involving researching information, ask yourself How useful is this information for my
current purposes? How credible is this information for
me to rely on and to pass on to others (i.e., cite)?
2012-01-25 Katherine Deibel, Fluency in Information Technology 50
How many patents did Buckminster Fuller file and hold?
A. 10B. 28C. 88D. 2000+E. Over 9000
2012-01-25 Katherine Deibel, Fluency in Information Technology 51
How did we find out that Buckminster Fuller didn't hold 2000 patents?
A. We checked at the patent office. B. We accepted the claims made by
encyclopedia.com as credible.C. We checked the rules for patents and found
that no one can be issued more than 100 patents in a lifetime.
D. All of these answers
2012-01-25 Katherine Deibel, Fluency in Information Technology 52
Announcements
Keep working on Project 1A Keep up GoPost discussions There will be a WebQ quiz for
Thursday/Friday labs Only get one shot at submitting!
2012-01-25 Katherine Deibel, Fluency in Information Technology 53