Visualizing Information: Using WebTheme to Visualize Internet Search Results Karen Buxton and Mary...

Preview:

Citation preview

Visualizing Information: Visualizing Information: Using WebTheme to Visualize Using WebTheme to Visualize

Internet Search ResultsInternet Search Results

Karen Buxton and Mary Frances LemboKaren Buxton and Mary Frances Lembo

The Value of Information:The Value of Information:American Society for Information ScienceAmerican Society for Information Science

Pacific Northwest ChapterPacific Northwest ChapterFall MeetingFall Meeting

Sept. 20-21, 2002Sept. 20-21, 2002

Visualizing Information: Visualizing Information: Using WebTheme to Visualize Using WebTheme to Visualize

Internet Search ResultsInternet Search Results

Karen Buxton and Mary Frances LemboKaren Buxton and Mary Frances Lembo

The Value of Information:The Value of Information:American Society for Information ScienceAmerican Society for Information Science

Pacific Northwest ChapterPacific Northwest ChapterFall MeetingFall Meeting

Sept. 20-21, 2002Sept. 20-21, 2002

PNNL-SA-36456

2

Presentation OverviewPresentation OverviewPresentation OverviewPresentation Overview

Brief Overview of Information Visualization

Introduction to WebTheme

Preparing a WebTheme Query

Exploring a Dataset

Question & Answer

3

Information VisualizationInformation VisualizationInformation VisualizationInformation Visualization

What is an information visualization? Visual representation of data, which allows the user to navigate

through large datasets more quickly and gain additional insight about the data.

What types of data can be used? Text

Image Data

Numerical Data.

Etc.

4

How Is Information Visualization Used?How Is Information Visualization Used?How Is Information Visualization Used?How Is Information Visualization Used?

Battlefield Awareness

Business Intelligence

Enterprise Knowledge

Management

Environmental Security

Intellectual Asset

Management

Intelligence Analysis

Law Enforcement

Market Assessment

Medical Informatics

Medical Research

Nuclear Non-Proliferation

Research Program Management

Science and Technology

Scanning

Translingual Text Analysis

5

Information Visualization at PNNLInformation Visualization at PNNLInformation Visualization at PNNLInformation Visualization at PNNL

Analyzes large volumes of text

Displays related documents

and themes as star clusters

and terrain maps

SPIRE Related Technologies 

WebTheme

Galaxies  ThemeView   Correlation Tool Starlight

6

What is WebTheme?What is WebTheme?What is WebTheme?What is WebTheme?

Web-enabled version of SPIREHarvests data from the World Wide Web by using search terms,

or following links derived

from user specified URLs

7

LicensingLicensingLicensingLicensing

Government Agencies (NOT Contractors) WebTheme use agreement available at no cost!

Installation and training agreement available for a fee

Non-Governmental Organizations

Negotiate a contract

Recommend installation and trainingagreement

8

Why Use WebTheme?Why Use WebTheme?Why Use WebTheme?Why Use WebTheme?

Investigate and characterize websites

Investigate a new technology

Find key players in a particular field

Find opportunities for Collaboration

9

Using WebThemeUsing WebThemeUsing WebThemeUsing WebTheme

Preparing a WebTheme Query

Planning a Query

Creating a Dataset

Exploring a Dataset

Using WebTheme Tools

Exploring a Galaxy

Exploring ThemeView

10

11

Planning a WebTheme QueryPlanning a WebTheme QueryPlanning a WebTheme QueryPlanning a WebTheme Query

Decide How to Gather the Data

Experiment with Search Engine Queries Google Altavista

Examine Search Results Modify Query If Needed

Exploration of the Site or Links URL List

12

WebThemeWebThemeWebThemeWebTheme

13

Create a New Data SetCreate a New Data SetCreate a New Data SetCreate a New Data Set

14

Create a Search Query or Create a Search Query or Follow a URL List?Follow a URL List?

Create a Search Query or Create a Search Query or Follow a URL List?Follow a URL List?

15

Harvest SettingsHarvest SettingsHarvest SettingsHarvest Settings

16

Advanced Options for HarvestingAdvanced Options for HarvestingAdvanced Options for HarvestingAdvanced Options for Harvesting

17

FiltersFiltersFiltersFilters

18

ProcessingProcessingProcessingProcessing

19

Galaxies LayoutGalaxies LayoutGalaxies LayoutGalaxies Layout

White Dots = DocumentsLocation Has Meaning Proximity Distance

Degree of Thematic Concentration Topic Strength & Number of

Documents Galaxies Clouds =

ThemeView Mountains

Note Instructions at Bottom of Window

20

WebTheme ToolbarWebTheme ToolbarWebTheme ToolbarWebTheme Toolbar

21

Exploring GalaxiesExploring GalaxiesExploring GalaxiesExploring Galaxies

Cluster Centroids: Click on Centroid Circle to

See Cluster Terms

Thematic Labels Indicate Dominant Themes

in Clouds

22

Viewing Document TitlesViewing Document TitlesViewing Document TitlesViewing Document TitlesSelect Click +S icon Drag Select to Choose a

Group of Documents

View Document Titles Click +Ab icon Click on dots to reveal

titles

23

Viewing DocumentsViewing DocumentsViewing DocumentsViewing Documents

Document Viewer Search for Words in a DocumentView in Browser

24

Link ModeLink ModeLink ModeLink Mode

Must Turn on Link Mode BEFORE Processing

Arrows Indicate Links from One Page to Another

Circle Indicates No Links from Page

25

26

Probe ToolProbe ToolProbe ToolProbe Tool

To Use, Select the Probe Button (+P) Left Click to Probe

Region.

Shows a Graphical Representation of Topics at Designated Location Value Indicates

Relative Topic Strength

27

Gisting ToolGisting Tool Gisting ToolGisting Tool

To Use Select Documents to

Gist Click the Gist Button

(% )

Shows Top 50 Topics in Selected Documents:Identify Terms of Interest

Copy Terms to Clipboard Window

28

29

Query ToolQuery ToolQuery ToolQuery Tool

Word Query — Selects Documents that

Contain All Query Words

Click “Group Results” to create a set that contains search terms

30

Query ToolQuery ToolQuery ToolQuery Tool

Query By Example Looks for text similar to

the example Determines Location of

Greatest Term Strength Use Slider to Increase

Number Selected

31

Group ToolGroup ToolGroup ToolGroup Tool

Create subsets Retrieved from Query Selected in Galaxy

Dots Change Color to Reflect Group Membership

Combine SetsSelectDisjunctionIntersectionUnion

32

ThemeViewThemeViewThemeViewThemeView

33

When to ReprocessWhen to ReprocessWhen to ReprocessWhen to Reprocess

You Get Too Many Clusters that Are Too Similar Reduce Number of Clusters Requested

You Get Big Clusters with Too Many Unrelated Documents Increase Number of Clusters Requested

Questions?

Mary Frances Lembomf.lembo@pnl.gov

Karen Buxton karen.buxton@pnl.gov

BattelleU.S. Department of Energy

Pacific Northwest National Laboratory