15
Data Mining David Eichmann School of Library and Information Science The University of Iowa

Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Embed Size (px)

Citation preview

Page 1: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Data MiningData Mining

David EichmannSchool of Library and Information Science

The University of Iowa

David EichmannSchool of Library and Information Science

The University of Iowa

Page 2: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Why?Why?

Given enough data represented through enough dimensions, we loose the ability to see the patterns

Given enough data represented through enough dimensions, we loose the ability to see the patterns

Page 3: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

How?How?

Decision TreesNearest Neighbor ClusteringNeural NetworksRule InductionK-Means Clustering

Decision TreesNearest Neighbor ClusteringNeural NetworksRule InductionK-Means Clustering

Page 4: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

What is it?What is it?

The automated extraction of hidden predictive information from databases.

Key pointsAutomatedHiddenPredictive

The automated extraction of hidden predictive information from databases.

Key pointsAutomatedHiddenPredictive

Page 5: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

The Typical ProcessThe Typical Process

Page 6: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Evaluation CriteriaEvaluation Criteria

Receiver Operating Characteristic Curves

Receiver Operating Characteristic Curves

Page 7: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

But Nobody Said We Had To Do MATH….

But Nobody Said We Had To Do MATH….

Page 8: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Forms of DataForms of Data

StructuredDatabasesForms

Semi-StructuredTables on the WebBibliographic citationsGraphs & charts

UnstructuredFull text (e.g., journal articles, physician chart notes)

Images

StructuredDatabasesForms

Semi-StructuredTables on the WebBibliographic citationsGraphs & charts

UnstructuredFull text (e.g., journal articles, physician chart notes)

Images

Page 9: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Text MiningText Mining

Corpus now is a collection of text artifactsFull text when you’ve got it (e.g. newswire)Metadata when you don’t (e.g. MEDLINE)

The trick then becomes extracting ‘interesting’ relationships between ‘interesting’ entitiesWho killed whoWho works for whoWho makes what

Corpus now is a collection of text artifactsFull text when you’ve got it (e.g. newswire)Metadata when you don’t (e.g. MEDLINE)

The trick then becomes extracting ‘interesting’ relationships between ‘interesting’ entitiesWho killed whoWho works for whoWho makes what

Page 10: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

The Classic EntitiesThe Classic Entities

PersonsOrganizationsPlaces (Geography)Events

PersonsOrganizationsPlaces (Geography)Events

Page 11: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

A Newswire ExampleA Newswire Example

APW19981001.0262 [Israel(0.271), Jonathan Pollard (0.153), Benjamin Netanyahu(0.102), Bill Clinton(0.102), United States(0.055), ...]

Persons Bill Clinton (3) Jonathan Pollard (8) Moshe Fogel (2) Benjamin Netanyahu (2) Israeli Embassy (1)

Organizations Cabinet (1)

Places Israel (16) United States (5) Washington (2)

APW19981001.0262 [Israel(0.271), Jonathan Pollard (0.153), Benjamin Netanyahu(0.102), Bill Clinton(0.102), United States(0.055), ...]

Persons Bill Clinton (3) Jonathan Pollard (8) Moshe Fogel (2) Benjamin Netanyahu (2) Israeli Embassy (1)

Organizations Cabinet (1)

Places Israel (16) United States (5) Washington (2)

Page 12: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

In the Medical/Health RealmIn the Medical/Health Realm

UMLS an excellent frameworkOrganismChemicalActivityDisease

UMLS an excellent frameworkOrganismChemicalActivityDisease

Page 13: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

A MEDLINE ExampleA MEDLINE Example

Document: 89316090 - Reconstructive surgery in Nicaragua Provided MeSH Keywords

Human Nicaragua

Z01.107.169.690 Surgery, Plastic/*

G02.403.810.788 Phrases

[Reconstructive, surgery] [Nicaragua] [letter]

MeSH Terms Surgery (1) G02.403.810.762 Letter [Publication Type] (1)

Other Phrases Reconstructive surgery (1)

Document: 89316090 - Reconstructive surgery in Nicaragua Provided MeSH Keywords

Human Nicaragua

Z01.107.169.690 Surgery, Plastic/*

G02.403.810.788 Phrases

[Reconstructive, surgery] [Nicaragua] [letter]

MeSH Terms Surgery (1) G02.403.810.762 Letter [Publication Type] (1)

Other Phrases Reconstructive surgery (1)

Page 14: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

Concept Extraction Example

Concept Extraction Example

“Roman forces under Julius Caesar invade Britain.”(S (NP (NP Roman forces) (PP under (NP Julius Caesar))) (VP invade (NP Britain)) .)

Entity Attributes: <organization Roman forces> <person Julias Caesar> <placename Britain>

Concepts: <Roman forces - under - Julius Caesar> <Roman forces - invade - Britain>

“Roman forces under Julius Caesar invade Britain.”(S (NP (NP Roman forces) (PP under (NP Julius Caesar))) (VP invade (NP Britain)) .)

Entity Attributes: <organization Roman forces> <person Julias Caesar> <placename Britain>

Concepts: <Roman forces - under - Julius Caesar> <Roman forces - invade - Britain>

Page 15: Data Mining David Eichmann School of Library and Information Science The University of Iowa David Eichmann School of Library and Information Science The

And a Small Demo…And a Small Demo…