Upload
colin-sullivan
View
223
Download
0
Embed Size (px)
DESCRIPTION
Documents Information Need index query Rankingmatch documents ? Quick Overview of the IR Process
Citation preview
WIRED FutureWIRED Future• Quick review of Everything• What I do when searching, seeking and
retrieving• Questions?• Projects and Courses in the Fall• Course Evaluation
WIRED FocusWIRED Focus• Information Retrieval: representation, storage,
organization of, and access to information items
• Focus is on the user information need• User information need:- Find all docs containing information on Austin
which: • Are hosted by utexas.edu• Discuss restaurants
• Emphasis is on the retrieval of information (not data, not just a keyword match)
Documents
Information Need
index
query
Rankingmatch
documents
?
Quick Overview of the IR ProcessQuick Overview of the IR Process
Indexing and SearchingIndexing and Searching• Queries models work against the index- Find words, word counts, phrases- Sequential search, indexed search
• Inverted Files & Other Indices• Boolean Queries• Sequential Searching• Pattern Matching• Structural Queries• Data structures- The infrastructure of search- Varied per data set and query contexts
Personalized IR system designPersonalized IR system design• How would you design a personal IR system?• Who would use it?• How would you learn about them?- Interests- Sources- Preferences
• How do you evaluate a personal system?• Understanding users is the key to
personalizing search or search interfaces.
Information Seeking in ContextInformation Seeking in Context
Learning
Information Seeking
Information Retrieval
Analytical Strategy
BrowsingStrategy
How do we search?How do we search?
• Analytical• careful planning• recall of query terms• iterative query reformulations• examination of results• batched
• Browsing• heuristic• opportunistic• recognizing relevant information• interactive (as can be)
Behavioral ModelBehavioral Model
• Recurring Web behavioral patterns that relate people’s browser actions (Web moves) to their browsing/searching context (Web modes)
• Modes of scanning: Aguilar (1967) & Weick & Daft (1983, 1984)
• Moves in information seeking behavior: Ellis (1989) & Ellis et. al. (1993, 1997)
ISeek Behaviors & Web MovesISeek Behaviors & Web Moves
What do I use?What do I use?• Starting
- Bookmarks and groups of bookmarks- Search javascripts
• Chaining- Tabbed windows- Bookmarking- Printing
• Browsing and Differentiating- Firefox/Mozilla & recommended links- Blogrolls and PageRank
• Monitoring- RSS feeds with RSS reader- (Moderated) Listservs
• Extracting- Saving as HTML, Text, or PDF- Bookmarking & Printing
How do we really use the Web?How do we really use the Web?• People don’t read, they scan Web pages• We move quickly, we know we can go back• Quick experimentation & short memory• Behaviors that work are reinforced &
continued• Satificing makes measures of quality difficult
How do I use the Web?How do I use the Web?• Set of standard, daily Web pages• Set of “occasional” Web pages- Fridays - movie reviews, show times, previews- Monthly - stocks and funds
• Quick focus on a subject, build a set of documents related to that and file for later use
• I scan quickly down the page and then back up the page
• Site maps, other links, walk up the URL
Future: Social IssuesFuture: Social Issues• Who controls the sharing?• Who controls the controls?• “Give to get” systems• Anonymity vs. Community- Community of “friends”- People as data points
• Free riders• Logrolling and Over-rating
Future: Filtering for IRFuture: Filtering for IR• How about filtering, without the collaboration?- Individual preferences- Implicit and Explicit
• Text is analyzed- Feature extraction- Recall & precision measures
• New models for multidimensional users/uses/ratings
• Relevance Feedback- Faster matching, more accurate- Metadata (use data, preferences)
Future: Community Centered CFFuture: Community Centered CF• Forming and keeping community- Interfaces, functionality
• Helping people find new information- Interactive search- Group browsing
• Mapping community (prefs?)- Daily News
• Rating Web pages- Incenting users to share
• Providing access to stored preferences- Fair, open data collection- Users can tune data
WWW Documents InvestigationWWW Documents Investigation• How do you collect data like this?- Web Crawler
• URL identifier, link follower- Index-like processing
• Markup parser, keyword identifier• Domain name translation (and caching)• How do these facts help with indexing?• Have general characteristics changed?
• (This would be a great project to update.)
MetadataMetadata• Information that describes a document that is
not (necessarily) in the document• Describes the document in relation to other
documents• Context about the Content• Document semantics• Internally consistent descriptions of content
for individual documents, document sets or a specified set of content.
• For collections or individual documents
Metadata TypesMetadata Types• Dublin Core elements• MARC (machine readable cataloging)- What isn’t machine readable?
• Semantic Web elements• Bottom-up, derived data• Format-based- ASCII, EBCDIC- RTF- PostScript PDF- MIME
Digital LibrariesDigital Libraries• We all have them- Email boxes, archives- Papers written- Bookmarks
• What I have- 4GB of academic & technical papers
• Mostly PDF, HTML, text- Indexed using Adobe Catalog, htDig, OS X Search- Data sets from previous studies- Program code- Scanned documents
Big DigLib QuestionsBig DigLib Questions• What’s a document?
- A file or link• How do you trace & track the information source?
- Filenames, memory, metadata• How do you integrate the variety of documents & metadata?
- Stick to standard formats• What kind of storage model?
- Version Control system- Server storage- Filenames and directories
• When do you Index?- Continuously- After a backup
• Mostly boolean searching with attributes
Course Evaluations (next week)Course Evaluations (next week)• Volunteer to get, distribute, collect and turn-in
evaluations
• Overall level of class expertise relevant for you?• Favorite readings – type of readings?• Least favorite (obscure – difficult) readings?• Project ideas and group organization tools?• Assignments: Group Work vs. Papers?