View
2.008
Download
3
Embed Size (px)
DESCRIPTION
Christopher Tomas defended his thesis on "Knowledge Acquisition in a System". Video can be found at: http://www.youtube.com/watch?v=NeQomGsJvDk
Citation preview
Knowledge Acquisition in a System
Christopher ThomasOhio Center of Excellence in Knowledge-enabled Computing
- Kno.e.sis, Wright State University
Dayton, [email protected]
Knowledge Enabled Information and Services Science
Circle of knowledge in a System
2
Knowledge Enabled Information and Services Science 3
Dissertation OverviewConceptual Knowledge: Ontologies, LoD
Doozer++:Taxonomy extraction Relationship/Fact extraction [IHI, WebSem1, IEEE-IC, WebSci, WI1]
Information Quality[WI2]Social processes for content creation [CHB]
Textual Information: Wikipedia, Web
Knowledge merging/Ontology alignment [AAAI, WebSem2, SWSWPC]
Social processes for knowledge validation [IHI,WebSci, CHB]
Knowledge Representation [IJSWIS, CR, FLSW]Ontology design [WWW, FOIS]
33
Knowledge Enabled Information and Services Science 4
Talk Contents
What is knowledge?
How do we turn propositions/beliefs into knowledge?
How do we acquire information?
Knowledge Enabled Information and Services Science 5
Talk outline
• Motivation• Knowledge Acquisition (KA) Overview• KA in a loosely connected system – Doozer++
– Automatic formal domain model creation– Information Extraction
• Top-Down• Bottom-Up
– Information Validation “in use”• Conclusion
Knowledge Enabled Information and Services Science
Larger Context of automated KA
• Increasing significance of knowledge economy– “Knowledge Workers” spend 38% of their time
searching for information (McDermott, 2005)– Vital to get a quick and still comprehensive
understanding of a field through pertinent concepts/entities and relations/interactions
• Increased demand for formally available knowledge in semantic models– Filtering, browsing, annotation, reasoning
Mcdermott, M. "Knowledge Workers: How can you gauge their effectiveness." Leadership Excellence. Vol. 22.10. October 2005
Knowledge Enabled Information and Services Science
Motivating Scenario
• Learn about a new subject– E.g. gain a quick overview over a current or
historical event• Use a formal representation of the gained
overview to filter information– Facilitate in-depth exploration
• Use the formalized information and the user interaction to create knowledge from information
7
Knowledge Enabled Information and Services Science
Motivating Scenario
• Google: India
• Brief description – demographic-, geographic information, etc.
8
Knowledge Enabled Information and Services Science
Motivating Scenario
• Google: India
• Regular Web results
9
Knowledge Enabled Information and Services Science
Motivating Scenario
• Clicking on a link to the Wikipedia entry shows that there have been conflicts with Pakistan over the region of Kashmir
Investigate more
10
Knowledge Enabled Information and Services Science
Motivating Scenario
• Google: India Pakistan Kashmir
• Only Web results and news
So far, search engines only display facts about entities, not relationships or larger contexts
11
Knowledge Enabled Information and Services Science
Motivating Scenario
• Beneficial to get an overview “at a glance” over a domain.
• Automated approach to creating knowledge models for focused areas of interest
• Create models around an incomplete or rudimentary keyword description and “anticipate” user’s intentions wrt. the full context
12
Knowledge Enabled Information and Services Science
Motivating Scenario
Doozer++: india pakistan kashmir• Important concepts and relationships
describing the context
13
Knowledge Enabled Information and Services Science
Motivating Scenario
• Filtered IR using concepts in the model
• Concepts and relationships that contributed to clicked results gain support
• User can explicitly approve content
14
Knowledge Enabled Information and Services Science 15
Circle of Knowledge (Example)
Knowledge Enabled Information and Services Science
Motivating Scenario
• On-demand creation of domain knowledge improves individual comprehension of an event
• Formal models are easy to use in information filtering
• Validated information Knowledge– Can be given back to the community to
improve the overall amount of formal knowledge available on the Web
– E.g. “Unknown” to DBPedia that the region of Kashmir belongs to both India and Pakistan
16
Knowledge Enabled Information and Services Science
Importance of Model creation
• Models support individual user or know-ledge worker, but also groups or system– More efficient communication through small,
shared, agreeable conceptualizations• People people• People system• System system
– Classify or filter pertinent and topical information using models
– Model-assisted searching and faceted or exploratory browsing using relationships
– Reuse of validated knowledge
Knowledge Enabled Information and Services Science
Domain Knowledge Models
• Scientific applications– In-depth description of concepts– Narrow field– People system, system system
• Annotation, reasoning⇒Absolute correctness necessary (as far as possible)
• General applications– Broad coverage of the field– Context – how does the new information fit in?– People people, people system
• Individual domain comprehension, filtering, annotation⇒Relative correctness sufficient
18
Knowledge Enabled Information and Services Science
Model Creation Resources
• Large models are available as reference– DBPedia, YAGO, UMLS, MeSH, GO …– Too big to be efficiently and effectively usable
• Prior knowledge required to find pertinent resources
• Other information is available in great abundance, but unformalized– Tacit expert knowledge– Scientific databases– Free text
• peer reviewed journals and proceedings• General Web content
19
Knowledge Enabled Information and Services Science
Epistemological Considerations
• Knowledge– Ensure epistemological soundness of
automated knowledge acquisition• Reference
– Ensure that nodes in the models refer to real-world concepts/entities
20
Knowledge Enabled Information and Services Science
Knowledge
• Functional Definition– Knowledge = “Know-How” – Practical, but weak,
Includes “Actionable Information”• Categorical Definition
– Knowledge = Justified true belief– S knows that p iff
i. p is true;
ii. S believes that p;
iii. S is justified in believing that p.
Knowledge Enabled Information and Services Science
Belief and Justification
• Belief– Statements held by the system
• Justification– Trusted sources– Extraction algorithms
• Bayesian, deductive or inductive reasoning• Macro-Reading algorithms Wisdom of the crowds
– Validation
22
Knowledge Enabled Information and Services Science
Truth assessment of a statement
• Is truth correspondence?– “A” is true iff A (a true statement corresponds
to an actual state of affairs)• Is truth coherence?
– Does the statement fit into the system of other statements?
• Is truth consensus?– agreement of correctness amongst a group
⇒In the cyclical model, achieve high degree of certainty by allowing constant validation
No Access
Knowledge Enabled Information and Services Science
Domain Model – Reference
• Model of a domain conceptually split– Domain Definition
Concepts identified by URIs (classes, entities, relationship types) ensures reference
Remains static – necessityRigid designators (Kripke)
– Domain DescriptionRelationships describe concepts Subject to change – possibilityDefinite descriptions (Russell)
Knowledge Enabled Information and Services Science
Domain Definition
• Top-down concept identification• Achieved through
– Manual creation based on consensus in a group
– Extraction from community-created or peer-reviewed conceptualization• Wikipedia• MeSH or UMLS Semantic Network
Knowledge Enabled Information and Services Science
Domain Description
• Possible to do top-down extraction of the domain description, e.g. from DBPedia
• Problem: Formal concept descriptions are sparse– On average, DBPedia has less than 2 object
properties per entity• Extract descriptions (facts) bottom-up
– Available in text, DBs, etc.– Domain-specific molecular structure extractors
(GlycO)– Domain independent IE techniques (Doozer++)
Knowledge Enabled Information and Services Science
Knowledge Acquisition Approaches
• KA in a tightly connected system– GlycO: domain-specific BioChemistry ontology
• Manual domain definition and description• Partial automatic domain description• Domain-specific automatic validation• Manual validation for false negatives
• KA in a loosely connected system– Doozer++: general domain-model creation framework
• Automatic domain definition, top-down concept extraction• Automatic domain description, bottom-up fact extraction
– Extraction from trusted sources– A trusted extraction and validation procedure
• Domain-independent community-based validation
Knowledge Enabled Information and Services Science
Knowledge Acquisition Approaches
Knowledge Engineering Approach
Traditional Extraction Approach
GlycO Doozer++
Definition Top-Down Bottom-up Top-Down Knowledge Engineering
Top-DownConceptually, by extraction from Top-Down corpus
Description Top-Down Bottom-up Bottom-up, restricted by Top-down definition
Bottom-up,restricted by Top-down definition
Verification Manual Manual Correctness: automatic:Exceptions: added manually
Community-based validation
Knowledge Enabled Information and Services Science
KA on the Web - Vision
• Web searches, browsing sessions or classification task can be seen as creating an implicit domain model– World view, Concept coverage, Facts
• Make models explicit and reusable using formal descriptions (RDF, OWL)
• Validate the contained information and share with the community
Increase system’s knowledge by “doing what you do”: Search, browse, click, communicate
29
Knowledge Enabled Information and Services Science 30
KA in a Loosely Connected System
Scooner Evaluation in Use:Semantic browsing and retrieval, Domain-independent,Community-based
Doozer++– Domain Definition:
Top-down concept extraction
– Domain Description: Pattern-based fact extraction
• Linked Data
• Free text• Wikipedia• Web
Domain Model creation to gradually increase overall knowledge of the system• User-interest driven • Incentive to
evaluate
Domain Definition
Domain Description
Validation
Knowledge Enabled Information and Services Science 31
• Identify concepts, concept labels (denotations) and concept hierarchy
• Challenge: define narrow boundaries for a domain while at the same time ensuring broad conceptual coverage within the domain
Domain Definition Requirements
Knowledge Enabled Information and Services Science
Domain definition - conceptual
• Expand and Reduce approach– Start with ‘high recall’ methods
• Exploration – Full text search• Exploitation – Graph-Similarity Method• Category growth• “What could be in the domain?”
– End with “high precision” methods• Apply restrictions on the concepts found• Remove terms and categories that fall outside the
dense areas of the model graph• “What should be in the domain?”
32
Knowledge Enabled Information and Services Science
Domain Description - Classifier
• Concept-aware– Use concepts and concept labels from the
domain definition step • Fact extraction as classification of
concept pairs into relationship types– fclass: C C R– RS,O = {R | p(R,S,O) > ε}
Knowledge Enabled Information and Services Science
Domain Description
• Combined Language model and Semantic classification model
• Language model: Surface-pattern – based– Pattern manifestations of relationships as
features– Open to any corpus, language independent– Less computational overhead than NLP
• Semantic Classification Model– Learned or assigned concept labels– Semantic types to aid classification
Knowledge Enabled Information and Services Science
Domain Description - Implementation
• Probabilistic Vector-space model– Each relationship is defined by vectors of
• Pattern probabilities• Domain/range probabilities
– Each concept is grounded by its semantic types and manifested by it’s labels and their probabilities of identifying the concept
– Sparse pattern representation (density ~2%)– White-box, easily verifiable– Inherently parallel
Knowledge Enabled Information and Services Science
Terminology
36
Symbol Meaning Example
S, O Subject and Object concepts (semantic)
Kelly_Miller_(scientist)Howard_University
LS,LO Subject and Object labels
“Kelly Miller”“Howard University”
PLS,LOPhrase instantiating the pattern
Kelly Miller graduated from Howard University
P Pattern <Subject> graduated from <Object>
TS,TO Semantic type of Subject or Object
PersonEducational_Institution
R relationship almaMaterbirthPlace
Knowledge Enabled Information and Services Science 37
Probabilistic Classifier
Labels taken from Lexicon
or linked corpus
Patterns learned from
free text
Semantic types. Asserted in Ontology or learned from linked data
Knowledge Enabled Information and Services Science 38
Probabilistic Classifier
Obama graduated in 1983 from Columbia University with a degree in political science and international relations.
p(R, Barack_Obama, Columbia_University)
How is Barack Obama related to Columbia University?
Sentence in corpus:
(Regular classification requires multiple examples)
Knowledge Enabled Information and Services Science 39
Probabilistic Classifier
p(almaMater ,Barack_Obama, Columbia_University) =
p(almaMater | “<Subject> graduated in 1983 from <Object>”) *
p(Barack_Obama | ”Obama”) *
p(Columbia_University | ”Columbia University”) *
p(almaMater | domain(person)) *
p(almaMater | range(academic_institution))
p(almaMater , Barack_Obama, Columbia_University) = 0.9 * 0.95 * 0.95 * 0.9 * 0.97
p(almaMater, Barack_Obama, Columbia_University) = 0.70909425
Obama graduated in 1983 from Columbia University
Knowledge Enabled Information and Services Science
Pattern Generalization
• Problem: Low recall in pattern-based IE• Substitute terms with wild cards
– No POS tagging, hence only “*” wild cards• Mirrors shortest paths through parse trees
40
<Subject> graduated in 1983 from <Object>
<Subject> * in 1983 from <Object>
<Subject> graduated * 1983 from <Object>
<Subject> * * 1983 from <Object>
<Subject> graduated in * from <Object>
<Subject> * in * from <Object>
<Subject> graduated * * from <Object>
<Subject> * * * from <Object>
Knowledge Enabled Information and Services Science
Learning p(R|P)
• Distantly Supervised Training• Collect pattern frequencies for training
examples– Fact triples <S, R, O> e.g. from Linked Data
(DBPedia, UMLS)– Manifestations of facts in text in the form of
patterns (corpus e.g. Web, Wikipedia, MedLine)
• For relationship Ri, aggregate pattern vectors representing <*, Ri, *>
41
Knowledge Enabled Information and Services Science
Learning p(R|P) – naïve
• For each vector Ri containing pattern frequencies for relationship Ri, compute
• #Patternj that occur with terms denoting each <S, O> Ri in normalized by all pattern
occurrences for Ri
42
Knowledge Enabled Information and Services Science
Learning p(R|P) – naïve
• Uniform distribution of relationships assumed– As the number of relationship types grows), the
prior of each type goes towards 0.– normalize the probabilities over the column
vector to get p(Ri|Pj)
• Vector space representation– Relationship-pattern matrix– R2Pij = p(Ri|Pj)
43
Knowledge Enabled Information and Services Science
Problem: Relationship Similarities
• Extensional similarity– Semantically different relationships can share
Subject-Object pairs in training data• Intensional similarity
– Overlap and entailment of relationship types• Types should not be seen as discrete
– E,g, physical_part_of part_of
• Apriori unknown which types overlap unless formal description available
– Semantically similar types compete for the same patterns
44
Knowledge Enabled Information and Services Science 45
Relationship similarities
Pertinence Measure similarity between pattern vectors as approximation of intensional similarity
Knowledge Enabled Information and Services Science
Pertinence for Relationships
Do not punish the occurrence of the same pattern with relationship types that are intensionally similar, but extensionally dissimilar
Reduce impact of extensionally similar relations
46
Knowledge Enabled Information and Services Science 47
Pertinence Example
Relationship p(R|P)biological_process_has_associated_location 0.968371381
disease_has_associated_anatomic_site 0.880452774
part_of 0.622532958
has_finding_site 0.561041318
has_location 0.537424451
has_direct_procedure_site 0.363832078
Sum: 3.933654958
Pattern: <Subject> in the right <Object>
Note: This never causes p(R,S,O) > 1
Knowledge Enabled Information and Services Science 48
Similarities between relationships
Knowledge Enabled Information and Services Science 49
Pertinence evaluation
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
PertinenceNo Pertinence
Recall
Pre
cisi
on
Knowledge Enabled Information and Services Science
Fact extraction evaluation - DBPedia
50
Pre
cisi
on /
Rec
all
Confidence Threshold
Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over 107 relation types.
60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus
Knowledge Enabled Information and Services Science
Sample results (DBPedia)
Subject :: Objectsuggested Relationship
Extracted Rank 1
(Rel;Confidence) Rank 2 Rank 3
Howard Pawley :: Gary Filmon
aftersuccessor;0.799
after;0.768
office;0.686
Mulan :: Tarzan afternextSingle;0.603
followedBy;0.533
after;0.416
Species Deceases:: Midnight Oil
artistproducer;0.761
artist;0.719
genre;0.467
The Crystal City :: Orson Scott Card
authorartist;0.625
author;0.617
writer;0.583
Horatio Allen :: William Maxwell
before predecessor;0.629 before;0.475
Basdeo Panday :: Trinidad &Tobago
birthplace deathPlace;0.658birthplace;0.658
nationality;0.330
Bob Nystrom :: Stockholm
birthplace cityOfBirth;0.677 birthplace;0.513
Beccles railway station :: Suffolk
borough district;0.772borough;0.770
friend;0.749
51
Knowledge Enabled Information and Services Science
Fact extraction evaluation - UMLS
52
Pre
cisi
on /
Rec
all
Confidence Threshold
Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over ~100 relation types.
60% training set, 40% testing, UMLS fact corpus, MedLine text corpus
Knowledge Enabled Information and Services Science
Sample results (UMLS)
Subject :: Object suggested Relationship Extracted Rank 1
Teeth::poisoning, fluoride finding_site_of finding_site_of768 polyps::polyp of cervix nos (disorder) associated_with associated_with
neck of uterus::polyp of cervix nos (disorder) location_of finding_site_of
benign neoplasms::polyp of colon related_to associated_with
brain ischemia::brain has_finding_site location_of
gastrointestinal tract::polyp of colon is_primary_anatomic_site_of_disease location_of
gamete structure (cell structure)::polyvesicular vitelline tumor
is_normal_cell_origin_of_disease
is_normal_cell_origin_of_disease
53
Knowledge Enabled Information and Services Science 54
Comparison – DBPedia corpusMintz: extraction
of 102 relation-ship types from Freebase
Doozer: 107 from DBPedia
M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in ACL2009.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Mintz-POSMintz-NLPDoozer++ (R)Doozer++ (P)
Recall
Pre
cisi
on
(R) Recall-oriented, using pattern generalization
(P) Precision- oriented, no generalization
Knowledge Enabled Information and Services Science
Evaluate Ad-Hoc Model Creation
• On demand creation of models
55
Domain QueryNumber of Concepts
Precision (Domain
Definition)
Semantic Web “Semantic Web” OWL ontologies RDF 143 0.98
Harry Potter “Harry Potter” dumbledore gryffindor slytherin 134 0.98
Beatles Beatles "John Lennon" "Paul McCartney" song 250 0.99
India-Pakistan Relations India Pakistan Kashmir 129 0.99US Financial crisis - TARP
tarp "financial crisis" "toxic assets" 146 0.93
German Chancellors
German chancellors "Angela Merkel" "Helmut Kohl" 124 0.91
Knowledge Enabled Information and Services Science 56
Ad-Hoc Model Creation - Evaluation
Knowledge Enabled Information and Services Science 57
Ad-Hoc Model Creation - Evaluation
Relative Recall
Recall wrt. possible extraction. I.e. the maximum number of extracted facts marks 100% recall
Knowledge Enabled Information and Services Science 58
Related Work
Mintz
SOFIETurney
Structural
Open IE
Supervised
Distant Supervision
Coupled learner
Sur-face pat-terns only
Perti-nence for Semantic
simi-larity
Knowledge Enabled Information and Services Science
Main Differences
• Surface-patterns only• Only positive training examples• Pertinence measure for semantic similarity• Concept-aware: start with defined concepts• Include background knowledge in
probabilistic classification instead of rule-based reasoning
59
Knowledge Enabled Information and Services Science 60
Related work
• Pattern-based fact extraction– E. Agichtein and L. Gravano. Snowball: Extracting
relations from large plain-text collections. In JCDL, 2000.
– Suchanek, Fabian M., Mauro Sozio, and Gerhard Weikum. SOFIE : A Self-Organizing Framework for Information Extraction. WWW 2009.�
– T. M. Mitchell, J. Betteridge, A. Carlson, E. Hruschka, and R. Wang. Populating the Semantic Web by Macro-Reading Internet Text. ISWC 2009.
– M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts-step one: the one-million fact extraction challenge. In AAAI 2006.
Knowledge Enabled Information and Services Science 61
Related work
• Relationship-pattern computations– P. D. Turney and P. Pantel. From Frequency to
Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 2010.
– P. D. Turney. Expressing implicit semantic relations without supervision. In ACL 2006
Knowledge Enabled Information and Services Science 62
Summary Fact extraction
• Pattern-based fact extraction with generalization and Pertinence achieves competitive precision and recall while being computationally feasible for large-scale extraction– Pertinence computation can also be a
preprocessing step for other ML techniques• Different types of background knowledge
incorporated into one statistical framework– Combined Language model and Semantic
model
Knowledge Enabled Information and Services Science
Application and Knowledge Validation
63
Scooner: Semantic browsing and retrieval – Evaluation in Use
Doozer++– Hierarchy extraction– Pattern-based fact
extraction
• 18 Million MedLine publications/abstracts
• UMLS Metathesaurus
• Wikipedia
Example: Domain model as a basis for research in the area of human cognitive performance.
Knowledge Enabled Information and Services Science 64
Domain Definition – Extracted Hierarchy
A hierarchy extracted for a cognitive science domain model.
The keyword description given to the system was a collection of terms relevant to human performance and cognition.
Knowledge Enabled Information and Services Science
Domain Description: Connect Concepts
65
Knowledge Enabled Information and Services Science
Expert Evaluation of Facts in the Model
0.
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9
Fractio
n
Score
Fraction in binCumulative incorrectCumulative correctCumulative interesting
7-9: Correct Information not commonly known
1-2: Information that is overall incorrect
3-4: Information that is somewhat correct
5-6: Correct general Information
66
Knowledge Enabled Information and Services Science 67
Extractor Confidence vs. Correctness
• Analysis shows that highest quality extractions have the highest confidence, but also incorrectly extracted facts have high confidence
High-quality patterns as well as some noise-patterns have high indicative power.
Knowledge Enabled Information and Services Science 68
Extractor Confidence vs. Correctness
• Many facts deemed interesting were extracted based on highly specialized patterns in the long tail of the frequency distribution.
• Noisy patterns also tend to occupy this space
Knowledge Enabled Information and Services Science
Sources of Errors
• Extracted relationship too specific or formally incorrect but metaphorically correct.– <Interpeduncular_Cistern disease_has_associated_
anatomic_site Cerebral_peduncle> is incorrect, • Interpeduncular Cistern is not a disease. However, it does have
the associated anatomic site Cerebral peduncle.
• Incorrect directionality– <Pituitary_Gland sends_output_to Supraoptic_
nucleus> should be <Supraoptic_nucleus sends_ output_to Pituitary_Gland>• Direction in text often expressed in the context rather than the
immediate pattern
69
Knowledge Enabled Information and Services Science
Validation
• Extracted statements need to be validated to be considered knowledge– Explicit validation, e.g. thumbs up/down– Implicit validation, e.g. by analyzing click streams
70
Knowledge Enabled Information and Services Science
Explicit Validation
• Certainty of reference– I.e. we know exactly which statement was
validated
• Validator credentials can be obtained– E.g. a small community of experts may evaluate
• Extra work– Explicit validation is a task that is consciously
performed
71
Knowledge Enabled Information and Services Science
Implicit Validation
• Find indications of correctness or incorrectness based on the way the users interact with the presented information– Every action taken on a piece of information is
recorded and analyzed– The cumulative behavior of the users gives an
indication of which propositions are correct or interesting
72
Knowledge Enabled Information and Services Science
Implicit Validation
• Examples for implicit community-validation– Games with a purpose (L. von Ahn)– Google search rankings
• Scooner semantic browser– Browse literature along facts in a model– Browsing trails suggest correct extraction
73
Knowledge Enabled Information and Services Science
Implicit Validation
• A fact is browsed very often by different users.– The fact is interesting to many users. – The fact is surprising and interesting, but may be incorrect.
• A user follows a trail of multiple fact-triples trough a variety of documents.– The facts that were browsed have a high probability of being correct and support is
added to the triples.– If the trail was longer than suggested by a small-world phenomenon, initial triples
may have been incorrect, but led to interesting ones. For this reason, only the last k triples of the trail should garner support or the support should increase for the last k triples in the trail.
– The last triple in the trail may have been incorrect and led to browsing results that caused the user to stop browsing. For this reason, the last triple of the trail should be treated with caution.
74
Knowledge Enabled Information and Services Science 75
Validation “through use”
Enter search terms
Choose entity of interest
Browse extracted facts
Choose relevant literature that
supports the fact
Knowledge Enabled Information and Services Science 76
Validation “through use”
Fact trails are recorded
Find another interesting fact
Knowledge Enabled Information and Services Science 77
Validation “through use”Path suggests that at least the first 2 triples are factually correct
Knowledge Enabled Information and Services Science 78
Browsed Facts Examples
Knowledge Enabled Information and Services Science 79
Related work
• Evaluation and Use– E. Agichtein, E. Brill, and S. Dumais. Improving web
search ranking by incorporating user behavior information. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’06, page 19, 2006.
– A. Das, M. Datar, A. Garg, and S. Rajaram. Google News Personalization: Scalable Online Collaborative Filtering. In Proceedings of the 16th international conference on World Wide Web, page 280. ACM, 2007.
Knowledge Enabled Information and Services Science
Summary Knowledge Acquisition
• The model actually reflects what the user is interested in at the point of creation Willingness to help validate facts– Applications allow for implicit and explicit
evaluation• Validated Statements can be merged with
existing knowledge Automated acquisition completed Individual-driven KA improved overall system
80
• R. Kavuluru, C. Thomas et al. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains. IHI 2012
• Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE IC, Nov./Dec. 2010• C. Thomas et al. Improving Linked Open Data through On-Demand Model Creation. Web Science Conference, 2010.• C. Thomas, et al.. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. WIC 2008.
Knowledge Enabled Information and Services Science
Future Directions
• Active Learning to improve classification– Easy in tightly connected system (e.g. NELL)– Feedback mechanism for loosely connected
systems • Improve depth of classification
– Augment Domain Description with learned concept hierarchies from text (e.g. Navigli)
• Knowledge management for background knowledge– Belief updates– Model evolution
81
Knowledge Enabled Information and Services Science 82
ContributionsConceptual Knowledge: Ontologies, LoD
Taxonomy extraction [WI1, WebSci, WebSem1]Event modeling [IEEE-IC]Relationship/Fact/Event extraction [IHI, WebSem1, IEEE-IC, WebSci]
Information Quality[WI2]Social processes for content creation [CHB]
Textual Information: Wikipedia, Web
Knowledge merging/Ontology alignment [AAAI, WebSem2, SWSWPC]
Social processes for knowledge validation [IHI,WebSci, CHB]
Knowledge Representation [IJSWIS, CR, FLSW]Ontology design [WWW, FOIS]
8282
Knowledge Enabled Information and Services Science 83
Journal/Conference Publications
[WebSem] C. Thomas, P. Mehra, A. Sheth, W. Wang, G. Weikum. Automatic domain model creation using pattern-based fact extraction. Submitted to Journal of Web Semantics.
[IHI]R. Kavuluru, C. Thomas, A. Sheth, V. Chan, W. Wang, A. Smith, A. Sato and A. Walters. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains. IHI 2012 - 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012.
[IEEE-IC] Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE Internet Computing, vol. 14, no. 6, pp. 84-89, Nov./Dec. 2010, doi:10.1109/MIC.2010.137
[WebSci] C. Thomas, W. Wang, P. Mehra and A. Sheth. What Goes Around Comes Around Improving Linked Opend Data through On-Demand Model Creation. Web Science Conference, 2010.
[WI1] C. Thomas, P. Mehra, R. Brooks, and A. Sheth. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 1:496–502, 2008.
Knowledge Enabled Information and Services Science 84
Journal/Conference Publications
[WI2] C. Thomas and A. Sheth. Semantic Convergence of Wikipedia Articles. In Proceedings of the 2007 IEEE/WIC International Conference on Web Intelligence, pages 600–606, Washington, DC, USA, November 2007. IEEE Computer Society.
[WWW] S. S. Sahoo, C. Thomas, A. Sheth, W. S. York, and S. Tartir. Knowledge Modeling and its Application in Life Sciences: A Tale of two Ontologies. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 317–326, New York, NY, USA, 2006. ACM Press.
[FOIS] C. Thomas, A. Sheth, and W. York. Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain. In Proceeding of the 2006 conference on Formal Ontology in Information Systems: Proceedings of the Fourth International Conference (FOIS 2006), pages 115–127, Amsterdam (NL), 2006. IOS Press.
[AAAI] P. Doshi and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. In AAAI’06: proceedings of the 21st national conference on Artificial intelligence, pages 1277–1282. AAAI Press, 2006.
Knowledge Enabled Information and Services Science 85
Publications
[CHB] C. Thomas and A. Sheth. Web Wisdom - An Essay on How Web 2.0 and Semantic Web can foster a Global Knowledge Society. Computers in Human Behavior, Elsevier.
[WebSem2] P. Doshi, R. Kolli, and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. Web Semantics: Science, Services and Agents on the World Wide Web, 7(2):90–106, 2009.
[IJWGS] V. Kashyap, C. Ramakrishnan, C. Thomas, and A. Sheth. Taxaminer: an experimentation framework for automated taxonomy bootstrapping. International Journal of Web and Grid Services, 1(2):240–266, 2005.
[IJSWIS] A. P. Sheth, C. Ramakrishnan, and C. Thomas. Semantics for the semantic web: The implicit, the formal and the powerful. Int. J. Semantic Web Inf. Syst., 1(1):1–18, 2005.
[CR] S. Sahoo, C. Thomas, A. Sheth, C. Henson, and W. York. GLYDEan expressive XML standard for the representation of glycan structure. Carbohydrate research, 340(18):2802–2807, 2005.
Knowledge Enabled Information and Services Science 86
Other Publications
Workshop Publications
[SWLS] A. Sheth, W. York, C. Thomas, M. Nagarajan, J. Miller, K. Kochut, S. Sahoo, and X. Yi. Semantic Web technology in support of Bioinformatics for Glycan Expression. In W3C Workshop on Semantic Web for Life Sciences, pages 27–28, 2004.
[SWSWPC] N. Oldham, C. Thomas, A. Sheth, and K. Verma. METEOR-S Web Service Annotation Framework with Machine Learning Classification. Semantic Web Services and Web Process Composition, pages 137–146, 2005, Springer.
Book Chapters
[FLSW] C. Thomas and A. Sheth. On the expressiveness of the languages for the semantic web - making a case for a little more. Fuzzy Logic and the Semantic Web, pages 3–20, 2006.
Patent
[PAT] P. Mehra, R. Brooks and C. Thomas. ONTOLOGY CREATION BY REFERENCE TO A KNOWLEDGE CORPUS. Pub.No. US 2010/0280989 A1
Knowledge Enabled Information and Services Science
• Research– KR– Domain model
extraction / IE
• Collaborations– Complex Carbohydrate Research
Center at UGA
– HP Labs Palo Alto– Human Performance
Directorate, AFRL• Proposals
– HP Incubation & Innovation grant for Doozer++
– AFRL grant largely based on Doozer++
– NSF proposal submitted with “very good” reviews
• Tools and Ontologies– GlycO– GlycoViz– Doozer++– Scooner
87
Knowledge Enabled Information and Services Science 88
Thank you!
Gerhard Weikum
Shaojun Wang
Pascal Hitzler
Pankaj Mehra
Amit Sheth
Thanks to all Kno.e.sis Center Members
–Past and Present
Knowledge Enabled Information and Services Science
Thank you
89