23
Document Collections cs5984: Information Visualization Chris North

Document Collections cs5984: Information Visualization Chris North

Embed Size (px)

Citation preview

Page 1: Document Collections cs5984: Information Visualization Chris North

Document Collections

cs5984: Information Visualization

Chris North

Page 2: Document Collections cs5984: Information Visualization Chris North

Where are we?

• Multi-D• 1D• 2D• Hierarchies/Trees• Networks/Graphs• Document collections• 3D

• Design Principles• Empirical Evaluation• Java Development• Visual Overviews• Multiple Views• Peripheral Views

Page 3: Document Collections cs5984: Information Visualization Chris North

Structured Document Collections

• Multi-dimensional• author, title, date, journal, …

• Trees• dewey decimal

• Networks• web, citations

Page 4: Document Collections cs5984: Information Visualization Chris North

Envision

• Ed Fox, et al.

• Multi-D

• similar to Spotfire

Page 5: Document Collections cs5984: Information Visualization Chris North

Unstructured Document Collections

• Focus on Full Text

• Examples:• digital libraries, encyclopedia

• Web, homepages, photo collections

• Tasks:• search, keyword

• Browse

• Themes, subjects, topics, library coverage

• Size, distributions

Page 6: Document Collections cs5984: Information Visualization Chris North

Visualization Strategies

• Cluster Maps

• Keyword Query

• Relationships

• Reduced representation

• User controlled layout

today

today

Page 7: Document Collections cs5984: Information Visualization Chris North

Cluster Map

• Create a “map” of the document collection

• Similar documents near

• Dissimilar document far

• “Grocery store” concept

Page 8: Document Collections cs5984: Information Visualization Chris North

Document Vectors

Doc1 Doc2 Doc3 …

• “aardvark” 1 2 0• “banana” 2 1 0• “chris” 0 0 3• …

• Similarity between pair of docs = •

• Layout documents in 2-D map by similarity• similar to spring model for graph layout

Page 9: Document Collections cs5984: Information Visualization Chris North

Cluster Algorithms

• Partition clustering: Partition into k subsets

• Pick k seeds

• Iteratively attract nearest neighbors

• Hierarchical clustering: Dendrogram

• Group nearest-neighbor pair

• Iterate

Page 10: Document Collections cs5984: Information Visualization Chris North

Kohonen Maps

• Xia Lin, “Document Space”• samal, ying

• http://faculty.cis.drexel.edu/sitemap/index.html

Page 11: Document Collections cs5984: Information Visualization Chris North
Page 12: Document Collections cs5984: Information Visualization Chris North

Themescapes, Cartia• PNL• Mountain height

= Cluster size

Page 14: Document Collections cs5984: Information Visualization Chris North

Map.net

• http://maps.map.net/start

Page 15: Document Collections cs5984: Information Visualization Chris North

Cluster Map

• Good:• Map of collection

• Major themes and sizes

• Relationships between themes

• Scales up

• Bad:• Where to locate documents with multiple themes?

» Both mountains, between mountains, …?

• Relationships between documents, within documents?

• Algorithm becomes (too) critical

Page 16: Document Collections cs5984: Information Visualization Chris North

Keyword Query

• Keyword query, Search engine• Rank ordered list

• “Information Retrieval”

Page 17: Document Collections cs5984: Information Visualization Chris North

Tilebars

• Hearst, “Tilebars”• reenal, xueqi

• http://elib.cs.berkeley.edu/tilebars/

Page 18: Document Collections cs5984: Information Visualization Chris North

VIBE• Korfhage, http://www.pitt.edu/~korfhage/interfaces.html

• Documents located between query keywords using spring model

Page 19: Document Collections cs5984: Information Visualization Chris North

VR-VIBE

Page 20: Document Collections cs5984: Information Visualization Chris North

Keyword Query

• Good:• Reduces the browsing space

• Map according to user’s interests

• Bad:• What keywords do I use?

• What about other related documents that don’t use these keywords?

• No initial overview

• Mega-hit, zero-hit problem

Page 21: Document Collections cs5984: Information Visualization Chris North

Assignment• Thurs: Document Collections

• Bederson, “Image Browsing”» Rui, anusha

• Card, “Web Book and Web Forager”» mrinmayee, ming

• Demo your hw3: tues or thurs

Page 22: Document Collections cs5984: Information Visualization Chris North

Next Week• Tues: 3-D data

• Kniss, “Interactive Volume Rendering with Direct Manip”» xueqi, mahesh

• Thurs: Workspaces• Robertson, “Task Gallery”

» supriya, varun

• Upson, “AVS”» christa, jun

• Thanksgiving break

• Tues 27: Debates• Kobsa, “Empirical comparison of comm infovis systems”

» kunal, zhiping

Page 23: Document Collections cs5984: Information Visualization Chris North

Upcoming Sched

• Tues: 3-D data

• Thurs: Workspaces

• Thanksgiving break

• Tues 27: Debates

• Thurs 29: How (not) to lie with visualization

• Dec: project presentations

• Dec 7: CHI 2-pagers due, student posters due