Upload
nenet
View
95
Download
0
Tags:
Embed Size (px)
DESCRIPTION
7 th December 2010. The Sixth Australasian Ontology Workshop, Adelaide University of South Australia. A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain. Elma Akand *, Mike Bain, Mark Temple - PowerPoint PPT Presentation
Citation preview
1
A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain
7th December 2010
Elma Akand*, Mike Bain, Mark Temple*CSE, UNSW/School of Biomedical and Health Sciences,UWS
The Sixth Australasian Ontology Workshop, Adelaide University of South Australia
Outline Machine learning and data mining in bioinformatics
Domain Ontologies in biomedical applications
Formal Concept Analysis
MCW algorithm (Mining Closed itemsets for Web apps)
BioLattice – a web based browser
Experimental Application: systems biology Part-1: Concept ranking by gene interaction
Part-2: Relational learning of multiple-stress rules
Machine learning & Data mining in Bioinformatics
Bioinformatics“Bioinformatics is the study of information content and information flow in biological systems and processes” (Michael Liebman,1995) Machine Learning & Data mining-Can offer automatic knowledge acquisition -Process to discover knowledge by analyzing data from different perspectives and can contribute greatly in building knowledge base Our work: focus on knowledge-based machine learning- Previous work: learning from ontologies - Current work: ontology construction by learning- Potential application areas: ontologies – central to eCommerce, eHealth- Current application area: systems biology – predict gene function, data integration
Ontology In philosophy - concerned with nature and relations of being
In knowledge representation - study of categorization of things:
Informal Ontology
Formal Ontology
Natural language
First order logic or a variant
Upper Ontology
Domain Ontology
Specific
General
Ontology
Ontology – "specification of a conceptualization” (Gruber, 1993)
Conceptualization – "formalization of knowledge in declarative form” (Genesereth and Nilsson, 1987)
Gene Ontology
Missing concepts and relations One gene annotated with different GO terms
with a term specialization of other
a
b
xy
x
gene: x concepts : a ,brelations : (i) x a (ii) x b and (iii) b a
Formal Concept Analysis (FCA)Mathematical order theory (Rudolf Wille in the early 80s) -Derives conceptual structures out of data -Method for data analysis, knowledge representation and information management
Components -Formal context, concept , concept lattice
four-legged
hair-covered
intelligent marine thumbed
cats x x
dogs x x
dolphins x x
gibbons x x x
humans x x
whales x x
Formal concepts in a concept lattice({cats, gibbons, dogs, dolphins, humans, whales}, {})
Bottom
({gibbons, dolphins, humans, whales}, {intelligent})
({dolphins, whales}, {intelligent, marine})
({cats, gibbons, dogs}, {hair-covered})
({cats, dogs}, {hair-covered, four-legged})
({gibbons, humans}, {intelligent, thumbed})
({gibbons}, {intelligent, hair-covered, thumbed})
({}, {intelligent, hair-covered, thumbed, marine, four-legged})
2
1
56
Top
3
4
Formal context: an n by m Boolean matrixm attributes A columns n objects O rows
Formal concept: Galois connection <X, Y> X is a subset of A, Y is a subset of O
Concept lattice loosely interpretable in ontology terms:concept definitions and cf. T-boxsub-concept relations
concept membership cf. A-boxby objects
FCA in data miningFCA can be seen as a clustering technique in machine learning
-Most of the work is in a propositional framework
In data mining closed itemset mining is an efficient alternative to FCA
A frequent itemset X is closed if there exists no proper superset Y such that Y⊃X with support(Y)=support(X)
E.g., if X = {a,b,c,d} and Y ={a,b,c,d,e} and support(Y)=support(X), then X is not closed
Parameters to avoid building entire lattice-Extent size must be greater than minsup
Existing closed itemset mining algorithms-Data structures to speed up closed itemset mining-But may not build lattice, or include extents
MCW algorithm (Mining Closed itemsets for Web apps)Vertical data format
IT-tree (itemset-tidset tree) search space -node has X x t(X) and all children have prefix X
Pruning- 4 set difference closure operators
Subsumption check - A look-up table to record all attributes and their occurrences in closed concepts
Lattice - adding concepts following a general to specific order
D2
4
5
6
A1
3
4
5
C1
2
3
4
5
6
T1
3
5
6
W1
2
3
4
5
attribute Concept_id
D C1,C2
T C3,C4
A C4,C5
W C2,C4,C5,C6
C C1,C2,C3,C4,C5,C6,C7
Is {TA}{135} closed?i(135)={TAWC}
Closure operators
{TA}{135}={TW}{135} ->{TAW}{135}
{D}{2456}⊂{C}{123456}->{DC}{2456}
{D}{2456} and {W}{12345}->{DW}{245}
D
2
4
5
6
A
1
3
4
5
C
1
2
3
4
5
6
T
1
3
5
6
W
1
2
3
4
5Based on CHARM (Zaki, 2005)
Visual analytics -combination of information visualization with machine learning and data analysis (Keim et al., 2008)
Visualization of concept lattice - provides overview of the structure of the domain - means for further data analysis, e.g., classification, clustering, implication discovery, rule
learning
Previous work- lattice navigation since Godin et al. (1993)-Browsable concept lattice, e.g., Kim & Compton (2004)
Our current work - on augmenting concept lattice by integrating multiple sources of knowledge (Gene Ontology, protein interactions) for further analysis & machine learning
Concept lattice as a visual analytics approach
Case study: Yeast systems biology
Browsable concept lattice
more general
Biological validation (1) : synthetic lethality
Synthetic lethal interactionif cell is viable when either gene A or B are individually deleted, but cannot grow when both are deleted.
Our results show that 72 (119) concepts in the lattice more likely than random chance at p < 0.01 (p < 0.05) to contain synthetic lethal pairs.
Protein-protein interaction data
Microarray gene-expression data
Transcription factor binding data (ChIP-chip)
Ontology data
Biochemical pathway data
Inductive Logic
Programming
concept(A):- ppi(B,A,C), ppi(B,A,E), ppi(B,C,E)tfbinds(D,C),fbinds(F,E)
First-order rule
Biological validation (2) : ILP learning of concept definitions
Transcription factors
RSM19 required for H2O2 response; RSM19, RSM22 and MRPS17 in “mitochondrial ribosomal small subunit” stable complex; and RSM22, MRPS17 bound by transcription factors under amino acid starvation.
Example rule:
ConclusionsMany real-world domains are data-intensive
Machine learning and data mining applications required to generate predictive and useful outputs
We focus on knowledge-based learning for comprehensibility – use ontologies
Formal concept analysis as a framework for ontology structure
Use data mining techniques for efficient concept lattice generation
Visual analytics approach: browsable lattice, added background knowledge
Initial validation on a case study from yeast systems biology
Investigate pseudo-intents to simplify concept lattice
Investigate variants of concept lattice structures-e.g., concept lattice of inverse context
Add concept definitions to background knowledge in ILP
Future work