© Jesse Davis 2006© Jesse Davis 2006
View Learning Extended:View Learning Extended:Learning New TablesLearning New Tables
Jesse DavisJesse Davis11, Elizabeth Burnside, Elizabeth Burnside11, , David PageDavid Page11, Vítor Santos Costa, Vítor Santos Costa22
11University of Wisconsin-MadisonUniversity of Wisconsin-MadisonUSAUSA
22Federal University of Rio de JaneiroFederal University of Rio de JaneiroBrasilBrasil
© Jesse Davis 2006© Jesse Davis 2006
1 P1 5/02 No 0.03 RU4 B
2 P1 5/04 Yes 0.05 RU4 M
3 P1 5/04 No 0.04 LL3 B
4 P2 6/00 No 0.02 RL2 B … … … … … … …
Abnormality Patient Date Calcification … Mass Loc Benign/ Fine/Linear Size Malignant
View Learning FrameworkView Learning Framework[Davis et al. IJCAI05][Davis et al. IJCAI05]
Learn fields predictive of Learn fields predictive of target concepttarget concept
© Jesse Davis 2006© Jesse Davis 2006
1 P1 5/02 No 0.03 No RU4 B
2 P1 5/04 Yes 0.05 Yes RU4 M
3 P1 5/04 No 0.04 No LL3 B
4 P2 6/00 No 0.02 No RL2 B … … … … … … … …
Abnormality Patient Date Calcification … Mass Increase Loc Benign/ Fine/Linear Size in size Malignant
Extend SchemaExtend Schema
IncreaseIn Size
No
Yes
No
No…
© Jesse Davis 2006© Jesse Davis 2006
Integrated Search for New Integrated Search for New FieldsFields
[Landwehr et al. AAAI 2005, Davis et al. ECML 2005][Landwehr et al. AAAI 2005, Davis et al. ECML 2005]
Old approach: Old approach: Step 1 use ILP to learn new fieldsStep 1 use ILP to learn new fields Step 2 learn statistical modelStep 2 learn statistical model
Score As You Use (SAYU):Score As You Use (SAYU): Combine steps 1 and 2Combine steps 1 and 2 Score new field by how much it helps statistical Score new field by how much it helps statistical
modelmodel
Parallel development: nFOIL Parallel development: nFOIL
© Jesse Davis 2006© Jesse Davis 2006
Relevant Intermediate Relevant Intermediate ConceptsConcepts
Advisedby(Student,Professor)
ta_for(Student,Professor)
ta(Student,Class) teach(Professor,Class)
coauthor(Person,Person)paper(Person,Ref)
Goal: Automatically generate and Automatically generate and incorporate intermediate conceptsincorporate intermediate concepts
© Jesse Davis 2006© Jesse Davis 2006
Limitations to Our Old WorkLimitations to Our Old Work
Previously View Learning adds new Previously View Learning adds new fieldsfields
More expressive to learn predicates More expressive to learn predicates not approximations to target conceptnot approximations to target concept represent new tablesrepresent new tables
Solution: Extend SAYUSolution: Extend SAYU
© Jesse Davis 2006© Jesse Davis 2006
VISTAVISTA Algorithm Algorithm
VView iew
IInvention throughnvention through
SScoring coring
TTables withables with
AAggregationggregation
© Jesse Davis 2006© Jesse Davis 2006
Distinguished types [id, patient, visit]
Algorithm IllustrationAlgorithm Illustration
p1(id,id)
p1/2
Rule 14 Rule N
ClassValue
…Score = 0.0
20.12
0.10
0.15
0.35
Rule 1Rule 2Rule 3
p2/1
p2(patient)
:-sameStudy(Id1,Id2):-historyOfBC(Patient):-hadBiopsy(_,Patient)
BackgroundKnowledge
© Jesse Davis 2006© Jesse Davis 2006
Algorithm Details Algorithm Details
Learn predicates withLearn predicates with Target predicate arityTarget predicate arity Target predicate arity + 1Target predicate arity + 1
Moded language Moded language
Breadth first search over clause bodiesBreadth first search over clause bodies
© Jesse Davis 2006© Jesse Davis 2006
Count AggregationCount Aggregation
1
2
3
4
5
6
…
Id Count
density_increase(density_increase(AA,,BB) :- density() :- density(AA,,D1D1),), prior_mammogram_same_loc(prior_mammogram_same_loc(AA,,BB), ), density(density(BB,,D2D2), ), D1D1 > > D2D2. .
0
1
0
0
1
2
…
Count
1 P1 5/02 low RU4 B
2 P1 5/04 high RU4 M
3 P1 5/04 none LL3 B
4 P2 6/00 none RL2 B
5 P2 6/02 low RL2 B
6 P2 9/03 high RL2 M
… … … … … …
Id Patient Date … Mass Loc Benign/ Density Malignant
© Jesse Davis 2006© Jesse Davis 2006
LinkageLinkage
Distinguished variable may not correspond Distinguished variable may not correspond to example keyto example key
p1(p1(PatientPatient) :-) :-
historyOfBC(historyOfBC(PatientPatient), hadBiopsy(), hadBiopsy(PatientPatient).).
Above rule adds a field to Patient tableAbove rule adds a field to Patient table
Q: How do we score p1?Q: How do we score p1?
© Jesse Davis 2006© Jesse Davis 2006
Linkage ExampleLinkage Example
P1 No
P2 Yes
P3 No
P4 No
P5 Yes
P6 No
…
Patient Family History
p1(p1(PatientPatient) :- historyOfBC() :- historyOfBC(PatientPatient), ), hadBiopsy(_, hadBiopsy(_,PatientPatient).).
1 P1 5/02 low RU4 B
2 P1 5/04 high RU4 M
3 P1 5/04 none LL3 B
4 P2 6/00 none RL2 B
5 P2 6/02 low RL2 B
6 P2 9/03 high RL2 M
… … … … … …
Id Patient Date … Mass Loc Benign/ Density Malignant
© Jesse Davis 2006© Jesse Davis 2006
1 P1 5/02 low RU4 B
2 P1 5/04 high RU4 M
3 P1 5/04 none LL3 B
4 P2 6/00 none RL2 B
5 P2 6/02 low RL2 B
6 P2 9/03 high RL2 M
… … … … … …
Id Patient Date … Mass Loc Benign/ Density Malignant
P1 No No
P2 Yes Yes
P3 No No
P4 No No
P5 Yes No
P6 No No
… … …
Patient Family p1 History
Linkage ExampleLinkage Example
p1(p1(PatientPatient) :- historyOfBC() :- historyOfBC(PatientPatient), ), hadBiopsy(_, hadBiopsy(_,PatientPatient).).
© Jesse Davis 2006© Jesse Davis 2006
Linkage ExampleLinkage Example
p1(p1(PatientPatient) :- historyOfBC() :- historyOfBC(PatientPatient), ), hadBiopsy(_, hadBiopsy(_,PatientPatient).).
1 P1 5/02 low RU4 B No
2 P1 5/04 high RU4 M No
3 P1 5/04 none LL3 B No
4 P2 6/00 none RL2 B Yes
5 P3 6/02 low RL2 B Yes
6 P4 9/03 high RL2 M Yes
… … … … … … …
Id Patient Date … Mass Loc Benign/ p1 Density Malignant
© Jesse Davis 2006© Jesse Davis 2006
New Features in VISTANew Features in VISTA
User declares set of User declares set of distinguished typesdistinguished types that appear in clause headthat appear in clause head
Allow Allow reuse of learned predicatereuse of learned predicate
Count aggregationCount aggregation
Linkage permits learning predicates with:Linkage permits learning predicates with: Higher arity than target (new tables)Higher arity than target (new tables) Different types than target Different types than target
© Jesse Davis 2006© Jesse Davis 2006
ExperimentExperiment
Q: Does VISTA or SAYU perform better?Q: Does VISTA or SAYU perform better?
© Jesse Davis 2006© Jesse Davis 2006
DatasetsDatasets
Cora (5 x 2 fold cross validation)Cora (5 x 2 fold cross validation)[McCallum et al. 00, Kok & Domingos 05][McCallum et al. 00, Kok & Domingos 05]
UW-CSE (5 fold cross validation)UW-CSE (5 fold cross validation)[Richardson & Domingos 04][Richardson & Domingos 04]
Mammography (10 fold cross Mammography (10 fold cross validation)validation)[Davis et al. 05][Davis et al. 05]
© Jesse Davis 2006© Jesse Davis 2006
Area Under Precision-Recall Area Under Precision-Recall CurveCurve
Generate wholeGenerate wholePR CurvePR Curve
Area Under PR for Area Under PR for Recall > 0.5Recall > 0.5P
reci
sion
Recall
© Jesse Davis 2006© Jesse Davis 2006
Cora
00.10.20.30.40.50.60.70.80.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Recall
Pre
cisi
on
VISTA
SAYU
© Jesse Davis 2006© Jesse Davis 2006
UW-CSE
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Pre
cisi
on
VISTA
SAYU
© Jesse Davis 2006© Jesse Davis 2006
Mammography
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Pre
cisi
on
VISTA
SAYU
© Jesse Davis 2006© Jesse Davis 2006
UW-CSE
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
Av
era
ge
Are
a U
nd
er
PR
Cu
rve VISTA
SAYUMLN
MLN data from Singla & Domingos AAAI 2005
© Jesse Davis 2006© Jesse Davis 2006
Related Topic: Predicate Related Topic: Predicate InventionInvention
Cigol: Muggleton & Buntine (1988)Cigol: Muggleton & Buntine (1988)
CHILLIN: Zelle & Mooney (1994)CHILLIN: Zelle & Mooney (1994)
FOIL-PILFS: Craven & Slattery (2001)FOIL-PILFS: Craven & Slattery (2001)
SLR: Popescul & Ungar (2004)SLR: Popescul & Ungar (2004)
© Jesse Davis 2006© Jesse Davis 2006
Related Work: Feature Related Work: Feature ConstructionConstruction
Pompe & Kononenko, ILP’95Pompe & Kononenko, ILP’95
Srinivasan & King, ILP’97Srinivasan & King, ILP’97
Perlich & Provost, KDD’03Perlich & Provost, KDD’03
Knobbe, de Haas & Siebes, PKDD’01Knobbe, de Haas & Siebes, PKDD’01
© Jesse Davis 2006© Jesse Davis 2006
Future WorkFuture Work
Further investigate benefits of VISTAFurther investigate benefits of VISTA Linkage as jumping deeper into search spaceLinkage as jumping deeper into search space Reuse of predicates Reuse of predicates
Extensions to VISTAExtensions to VISTA NegationNegation DisjunctionDisjunction Stochastic searchStochastic search
Comparisons to other SRL systemsComparisons to other SRL systems
© Jesse Davis 2006© Jesse Davis 2006
ConclusionsConclusions
VISTAVISTA adds capabilities adds capabilities Add fields to tables other than target relationAdd fields to tables other than target relation Learn new relationsLearn new relations
VISTAVISTA empirically empirically Better Cora (p-value < 0.001)Better Cora (p-value < 0.001) Almost better on UW-CSE (p-value < 0.06)Almost better on UW-CSE (p-value < 0.06) No worse on Mammography (p-value < 0.94) No worse on Mammography (p-value < 0.94)
© Jesse Davis 2006© Jesse Davis 2006
AcknowledgementsAcknowledgements
Mark CravenMark Craven Jude ShavlikJude Shavlik Inês DutraInês Dutra Mark GoadrichMark Goadrich Irene OngIrene Ong Trevor WalkerTrevor Walker
Raghu Raghu RamakrishnanRamakrishnan
Rich MaclinRich Maclin Lisa TorreyLisa Torrey Jan StruyfJan Struyf Allison HollowayAllison Holloway
This work was partially supported by Air Force grant F30602-01-2-0571
© Jesse Davis 2006© Jesse Davis 2006
Thank You!Thank You!
Questions?Questions?