28
1 Interfaces for Intense Information Analysis Marti Hearst UC Berkeley This research funded by ARDA

1 Interfaces for Intense Information Analysis Marti Hearst UC Berkeley This research funded by ARDA

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

1

Interfaces for Intense Information Analysis

Marti HearstUC Berkeley

This research funded by ARDA

2

Outline

• A contrast– Search vs. Analysis

• Goals for three user groups– Intelligence Analysts– Biomedical Researchers– Investigative Reporters

• Our current interface design

3

Search vs. Analysis

Search:Finding hay in a haystack

Analysis:Creating new hay

4

UIs for Search vs. Analysis

• Search: – A necessary but undesirable step in a

larger task– UI should not draw attention to itself– UI should be very easy to use for

everyone• Analysis:

– The larger task– UI can be more of a “science project”– But UI should have “flow”

5

General Goals

• Support hypothesis formation / refutation• Flow

– Easy creation, destruction, and cataloging of connections and coverage

– Easy movement between multiple views

• Represent:– Multiple supporting clues– Conflicting evidence– Uncertainty– Timeliness– Non-monotonicity

6

Intelligence Analysts

7

Intelligence Analysts

• I have recently interviewed several active counter-terrorist analysts

• Great diversity in– Goals– Computing environments

• Biggest problems are social/systemic

• Many mundane IT problems as well

8

Mundane IT Problems

• System incompatibilities• Data reformatting• Data cleaning• Documenting sources• Archiving materials

9

Intelligence Analysts: Problem 1

• Look at a series of reports, images, communication patterns;

• Try to build a model of what is going on– Follow leads– Compare to previous situations

• Recent problem: – Groups are changing their behavior patterns

quickly

• Very little use of sophisticated software tools

10

Intelligence Analysts: Problem 2

• Given a large collection• “Roll around” in the data

– See what has been “touched”• Tools should indicate which parts of the

collection have been examined and which have yet to be looked at, and by whom

– View data in several different ways• Data reduction methods such as MDS,

SVD, and clustering often hide important trends.

11

Intelligence Analysts: Problem 2

– Don’t show the obvious• e.g., Cheney is president

– Don’t show what you’ve already shown

– Only show the most recent version– Show which info is not present

• Changes in the usual pattern• Something stops happening

12

Intelligence Analysts: Problem 3

• Prepare a very short executive summary for the purposes of policy making– Really the culmination of a cascade of

summaries– Reps from different agencies meet and

“pow-wow” to form a view of the situation

– Rarely, but crucially, must be able to refer back to original sources and reasoning process for purposes of accountability

13

BioInformatics Researchers

14

BioInformatics Example 1

• How to discover new information … • … As opposed to discovering which

statistical patterns characterize occurrence of known information.

• Method:– Use large text collections to gather

evidence to support (or refute) hypotheses

– Make Connections– Gather Evidence

15

Etiology Example

• Don Swanson example, 1991• Goal: find cause of disease

– Magnesium-migraine connection

• Given – medical titles and abstracts– a problem (incurable rare disease)– some medical expertise

• find causal links among titles– symptoms– drugs– results

16

Gathering Evidence

stress

migraine

CCB

magnesium

PA

magnesium

SCD

magnesiummagnesium

17

Gathering Evidence

migraine magnesium

stress

CCB

PA

SCD

18

Swanson’s Linking Approach

• Two of his hypotheses have received some experimental verification.

• His technique– Only partially automated– Required medical expertise

19

BioInformatics Example 2:

• How to find functions of genes?– Have the genetic sequence– Don’t know what it does– But …

• Know which genes it coexpresses with• Some of these have known function

– So …infer function based on function of co-expressed genes

• This is problem suggested by Michael Walker and others at Incyte Pharmaceuticals

20

Gene Co-expression:Role in the genetic pathway

g?

PSA

Kall.

PAP

h?

PSA

Kall.

PAP

g?

Other possibilities as well

21

Make use of the literature

• Look up what is known about the other genes.

• Different articles in different collections

• Look for commonalities – Similar topics indicated by Subject

Descriptors– Similar words in titles and abstracts

adenocarcinoma, neoplasm, prostate, prostatic neoplasms, tumor markers, antibodies ...

22

23

Formulate a Hypothesis

• Hypothesis: mystery gene has to do with regulation of expression of genes leading to prostate cancer

• New tack: do some lab tests– See if mystery gene is similar in

molecular structure to the others– If so, it might do some of the same

things they do

24

Investigative Reporter Example

• Looking for trends in online literature

• Create, support, refute hypotheses

25

Investigative Reporter Example

What are the current main topics?

What are the new popular terms? How do they track with the news?

Clustering

Corpus-level statistics, Co-occurrence statistics

Contrasting collection statistics

26

Investigative Reporter Example

How long after a new Star Trek series comes on the air before characters from the series appear in stories?

How often do Klingons initiate attacks against Vulcans, vs. the converse?

Named-entity recognitionCreating a list of termsApply the list to a Subcollection

Create regex rules withPOS information

LINDI File Help

Summary

Query

Analysis

Term Set

Document Set

a c u y m z

x x

All terms: *

All documents: *

New

Merge

Diseases: emphysema cancer hypertension …

WHO: organization = world health organization

28

Thank you!

bailando.sims.berkeley.edu/lindi.html

For more information: