PhD thesis defense of Christopher Thomas

Knowledge Acquisition in a System

Christopher ThomasOhio Center of Excellence in Knowledge-enabled Computing

- Kno.e.sis, Wright State University

Dayton, [email protected]

Knowledge Enabled Information and Services Science

Circle of knowledge in a System

2

Knowledge Enabled Information and Services Science 3

Dissertation OverviewConceptual Knowledge: Ontologies, LoD

Doozer++:Taxonomy extraction Relationship/Fact extraction [IHI, WebSem1, IEEE-IC, WebSci, WI1]

Information Quality[WI2]Social processes for content creation [CHB]

Textual Information: Wikipedia, Web

Knowledge merging/Ontology alignment [AAAI, WebSem2, SWSWPC]

Social processes for knowledge validation [IHI,WebSci, CHB]

Knowledge Representation [IJSWIS, CR, FLSW]Ontology design [WWW, FOIS]

33


Talk Contents

What is knowledge?

How do we turn propositions/beliefs into knowledge?

How do we acquire information?


Talk outline

• Motivation• Knowledge Acquisition (KA) Overview• KA in a loosely connected system – Doozer++

– Automatic formal domain model creation– Information Extraction

• Top-Down• Bottom-Up

– Information Validation “in use”• Conclusion


Larger Context of automated KA

• Increasing significance of knowledge economy– “Knowledge Workers” spend 38% of their time

searching for information (McDermott, 2005)– Vital to get a quick and still comprehensive

understanding of a field through pertinent concepts/entities and relations/interactions

• Increased demand for formally available knowledge in semantic models– Filtering, browsing, annotation, reasoning

Mcdermott, M. "Knowledge Workers: How can you gauge their effectiveness." Leadership Excellence. Vol. 22.10. October 2005


Motivating Scenario

• Learn about a new subject– E.g. gain a quick overview over a current or

historical event• Use a formal representation of the gained

overview to filter information– Facilitate in-depth exploration

• Use the formalized information and the user interaction to create knowledge from information

7


Motivating Scenario

• Google: India

• Brief description – demographic-, geographic information, etc.

8


Motivating Scenario

• Google: India

• Regular Web results

9


Motivating Scenario

• Clicking on a link to the Wikipedia entry shows that there have been conflicts with Pakistan over the region of Kashmir

Investigate more

10


Motivating Scenario

• Google: India Pakistan Kashmir

• Only Web results and news

So far, search engines only display facts about entities, not relationships or larger contexts

11


Motivating Scenario

• Beneficial to get an overview “at a glance” over a domain.

• Automated approach to creating knowledge models for focused areas of interest

• Create models around an incomplete or rudimentary keyword description and “anticipate” user’s intentions wrt. the full context

12


Motivating Scenario

Doozer++: india pakistan kashmir• Important concepts and relationships

describing the context

13


Motivating Scenario

• Filtered IR using concepts in the model

• Concepts and relationships that contributed to clicked results gain support

• User can explicitly approve content

14


Circle of Knowledge (Example)


Motivating Scenario

• On-demand creation of domain knowledge improves individual comprehension of an event

• Formal models are easy to use in information filtering

• Validated information Knowledge– Can be given back to the community to

improve the overall amount of formal knowledge available on the Web

– E.g. “Unknown” to DBPedia that the region of Kashmir belongs to both India and Pakistan

16


Importance of Model creation

• Models support individual user or know-ledge worker, but also groups or system– More efficient communication through small,

shared, agreeable conceptualizations• People people• People system• System system

– Classify or filter pertinent and topical information using models

– Model-assisted searching and faceted or exploratory browsing using relationships

– Reuse of validated knowledge


Domain Knowledge Models

• Scientific applications– In-depth description of concepts– Narrow field– People system, system system

• Annotation, reasoning⇒Absolute correctness necessary (as far as possible)

• General applications– Broad coverage of the field– Context – how does the new information fit in?– People people, people system

• Individual domain comprehension, filtering, annotation⇒Relative correctness sufficient

18


Model Creation Resources

• Large models are available as reference– DBPedia, YAGO, UMLS, MeSH, GO …– Too big to be efficiently and effectively usable

• Prior knowledge required to find pertinent resources

• Other information is available in great abundance, but unformalized– Tacit expert knowledge– Scientific databases– Free text

• peer reviewed journals and proceedings• General Web content

19


Epistemological Considerations

• Knowledge– Ensure epistemological soundness of

automated knowledge acquisition• Reference

– Ensure that nodes in the models refer to real-world concepts/entities

20


Knowledge

• Functional Definition– Knowledge = “Know-How” – Practical, but weak,

Includes “Actionable Information”• Categorical Definition

– Knowledge = Justified true belief– S knows that p iff

i. p is true;

ii. S believes that p;

iii. S is justified in believing that p.


Belief and Justification

• Belief– Statements held by the system

• Justification– Trusted sources– Extraction algorithms

• Bayesian, deductive or inductive reasoning• Macro-Reading algorithms Wisdom of the crowds

– Validation

22


Truth assessment of a statement

• Is truth correspondence?– “A” is true iff A (a true statement corresponds

to an actual state of affairs)• Is truth coherence?

– Does the statement fit into the system of other statements?

• Is truth consensus?– agreement of correctness amongst a group

⇒In the cyclical model, achieve high degree of certainty by allowing constant validation

No Access


Domain Model – Reference

• Model of a domain conceptually split– Domain Definition

Concepts identified by URIs (classes, entities, relationship types) ensures reference

Remains static – necessityRigid designators (Kripke)

– Domain DescriptionRelationships describe concepts Subject to change – possibilityDefinite descriptions (Russell)


Domain Definition

• Top-down concept identification• Achieved through

– Manual creation based on consensus in a group

– Extraction from community-created or peer-reviewed conceptualization• Wikipedia• MeSH or UMLS Semantic Network


Domain Description

• Possible to do top-down extraction of the domain description, e.g. from DBPedia

• Problem: Formal concept descriptions are sparse– On average, DBPedia has less than 2 object

properties per entity• Extract descriptions (facts) bottom-up

– Available in text, DBs, etc.– Domain-specific molecular structure extractors

(GlycO)– Domain independent IE techniques (Doozer++)


Knowledge Acquisition Approaches

• KA in a tightly connected system– GlycO: domain-specific BioChemistry ontology

• Manual domain definition and description• Partial automatic domain description• Domain-specific automatic validation• Manual validation for false negatives

• KA in a loosely connected system– Doozer++: general domain-model creation framework

• Automatic domain definition, top-down concept extraction• Automatic domain description, bottom-up fact extraction

– Extraction from trusted sources– A trusted extraction and validation procedure

• Domain-independent community-based validation


Knowledge Acquisition Approaches

Knowledge Engineering Approach

Traditional Extraction Approach

GlycO Doozer++

Definition Top-Down Bottom-up Top-Down Knowledge Engineering

Top-DownConceptually, by extraction from Top-Down corpus

Description Top-Down Bottom-up Bottom-up, restricted by Top-down definition

Bottom-up,restricted by Top-down definition

Verification Manual Manual Correctness: automatic:Exceptions: added manually

Community-based validation


KA on the Web - Vision

• Web searches, browsing sessions or classification task can be seen as creating an implicit domain model– World view, Concept coverage, Facts

• Make models explicit and reusable using formal descriptions (RDF, OWL)

• Validate the contained information and share with the community

Increase system’s knowledge by “doing what you do”: Search, browse, click, communicate

29


KA in a Loosely Connected System

Scooner Evaluation in Use:Semantic browsing and retrieval, Domain-independent,Community-based

Doozer++– Domain Definition:

Top-down concept extraction

– Domain Description: Pattern-based fact extraction

• Linked Data

• Free text• Wikipedia• Web

Domain Model creation to gradually increase overall knowledge of the system• User-interest driven • Incentive to

evaluate

Domain Definition

Domain Description

Validation


• Identify concepts, concept labels (denotations) and concept hierarchy

• Challenge: define narrow boundaries for a domain while at the same time ensuring broad conceptual coverage within the domain

Domain Definition Requirements


Domain definition - conceptual

• Expand and Reduce approach– Start with ‘high recall’ methods

• Exploration – Full text search• Exploitation – Graph-Similarity Method• Category growth• “What could be in the domain?”

– End with “high precision” methods• Apply restrictions on the concepts found• Remove terms and categories that fall outside the

dense areas of the model graph• “What should be in the domain?”

32


Domain Description - Classifier

• Concept-aware– Use concepts and concept labels from the

domain definition step • Fact extraction as classification of

concept pairs into relationship types– fclass: C C R– RS,O = {R | p(R,S,O) > ε}


Domain Description

• Combined Language model and Semantic classification model

• Language model: Surface-pattern – based– Pattern manifestations of relationships as

features– Open to any corpus, language independent– Less computational overhead than NLP

• Semantic Classification Model– Learned or assigned concept labels– Semantic types to aid classification


Domain Description - Implementation

• Probabilistic Vector-space model– Each relationship is defined by vectors of

• Pattern probabilities• Domain/range probabilities

– Each concept is grounded by its semantic types and manifested by it’s labels and their probabilities of identifying the concept

– Sparse pattern representation (density ~2%)– White-box, easily verifiable– Inherently parallel


Terminology

36

Symbol Meaning Example

S, O Subject and Object concepts (semantic)

Kelly_Miller_(scientist)Howard_University

LS,LO Subject and Object labels

“Kelly Miller”“Howard University”

PLS,LOPhrase instantiating the pattern

Kelly Miller graduated from Howard University

P Pattern <Subject> graduated from <Object>

TS,TO Semantic type of Subject or Object

PersonEducational_Institution

R relationship almaMaterbirthPlace


Probabilistic Classifier

Labels taken from Lexicon

or linked corpus

Patterns learned from

free text

Semantic types. Asserted in Ontology or learned from linked data



Obama graduated in 1983 from Columbia University with a degree in political science and international relations.

p(R, Barack_Obama, Columbia_University)

How is Barack Obama related to Columbia University?

Sentence in corpus:

(Regular classification requires multiple examples)



p(almaMater ,Barack_Obama, Columbia_University) =

p(almaMater | “<Subject> graduated in 1983 from <Object>”) *

p(Barack_Obama | ”Obama”) *

p(Columbia_University | ”Columbia University”) *

p(almaMater | domain(person)) *

p(almaMater | range(academic_institution))

p(almaMater , Barack_Obama, Columbia_University) = 0.9 * 0.95 * 0.95 * 0.9 * 0.97

p(almaMater, Barack_Obama, Columbia_University) = 0.70909425

Obama graduated in 1983 from Columbia University


Pattern Generalization

• Problem: Low recall in pattern-based IE• Substitute terms with wild cards

– No POS tagging, hence only “*” wild cards• Mirrors shortest paths through parse trees

40

<Subject> graduated in 1983 from <Object>

<Subject> * in 1983 from <Object>

<Subject> graduated * 1983 from <Object>

<Subject> * * 1983 from <Object>

<Subject> graduated in * from <Object>

<Subject> * in * from <Object>

<Subject> graduated * * from <Object>

<Subject> * * * from <Object>


Learning p(R|P)

• Distantly Supervised Training• Collect pattern frequencies for training

examples– Fact triples <S, R, O> e.g. from Linked Data

(DBPedia, UMLS)– Manifestations of facts in text in the form of

patterns (corpus e.g. Web, Wikipedia, MedLine)

• For relationship Ri, aggregate pattern vectors representing <*, Ri, *>

41


Learning p(R|P) – naïve

• For each vector Ri containing pattern frequencies for relationship Ri, compute

• #Patternj that occur with terms denoting each <S, O> Ri in normalized by all pattern

occurrences for Ri

42


Learning p(R|P) – naïve

• Uniform distribution of relationships assumed– As the number of relationship types grows), the

prior of each type goes towards 0.– normalize the probabilities over the column

vector to get p(Ri|Pj)

• Vector space representation– Relationship-pattern matrix– R2Pij = p(Ri|Pj)

43


Problem: Relationship Similarities

• Extensional similarity– Semantically different relationships can share

Subject-Object pairs in training data• Intensional similarity

– Overlap and entailment of relationship types• Types should not be seen as discrete

– E,g, physical_part_of part_of

• Apriori unknown which types overlap unless formal description available

– Semantically similar types compete for the same patterns

44


Relationship similarities

Pertinence Measure similarity between pattern vectors as approximation of intensional similarity


Pertinence for Relationships

Do not punish the occurrence of the same pattern with relationship types that are intensionally similar, but extensionally dissimilar

Reduce impact of extensionally similar relations

46


Pertinence Example

Relationship p(R|P)biological_process_has_associated_location 0.968371381

disease_has_associated_anatomic_site 0.880452774

part_of 0.622532958

has_finding_site 0.561041318

has_location 0.537424451

has_direct_procedure_site 0.363832078

Sum: 3.933654958

Pattern: <Subject> in the right <Object>

Note: This never causes p(R,S,O) > 1


Similarities between relationships


Pertinence evaluation

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

PertinenceNo Pertinence

Recall

Pre

cisi

on


Fact extraction evaluation - DBPedia

50

Pre

cisi

on /

Rec

all

Confidence Threshold

Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over 107 relation types.

60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus


Sample results (DBPedia)

Subject :: Objectsuggested Relationship

Extracted Rank 1

(Rel;Confidence) Rank 2 Rank 3

Howard Pawley :: Gary Filmon

aftersuccessor;0.799

after;0.768

office;0.686

Mulan :: Tarzan afternextSingle;0.603

followedBy;0.533

after;0.416

Species Deceases:: Midnight Oil

artistproducer;0.761

artist;0.719

genre;0.467

The Crystal City :: Orson Scott Card

authorartist;0.625

author;0.617

writer;0.583

Horatio Allen :: William Maxwell

before predecessor;0.629 before;0.475

Basdeo Panday :: Trinidad &Tobago

birthplace deathPlace;0.658birthplace;0.658

nationality;0.330

Bob Nystrom :: Stockholm

birthplace cityOfBirth;0.677 birthplace;0.513

Beccles railway station :: Suffolk

borough district;0.772borough;0.770

friend;0.749

51


Fact extraction evaluation - UMLS

52

Pre

cisi

on /

Rec

all

Confidence Threshold

Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over ~100 relation types.

60% training set, 40% testing, UMLS fact corpus, MedLine text corpus


Sample results (UMLS)

Subject :: Object suggested Relationship Extracted Rank 1

Teeth::poisoning, fluoride finding_site_of finding_site_of768 polyps::polyp of cervix nos (disorder) associated_with associated_with

neck of uterus::polyp of cervix nos (disorder) location_of finding_site_of

benign neoplasms::polyp of colon related_to associated_with

brain ischemia::brain has_finding_site location_of

gastrointestinal tract::polyp of colon is_primary_anatomic_site_of_disease location_of

gamete structure (cell structure)::polyvesicular vitelline tumor

is_normal_cell_origin_of_disease

is_normal_cell_origin_of_disease

53


Comparison – DBPedia corpusMintz: extraction

of 102 relation-ship types from Freebase

Doozer: 107 from DBPedia

M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in ACL2009.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mintz-POSMintz-NLPDoozer++ (R)Doozer++ (P)

Recall

Pre

cisi

on

(R) Recall-oriented, using pattern generalization

(P) Precision- oriented, no generalization


Evaluate Ad-Hoc Model Creation

• On demand creation of models

55

Domain QueryNumber of Concepts

Precision (Domain

Definition)

Semantic Web “Semantic Web” OWL ontologies RDF 143 0.98

Harry Potter “Harry Potter” dumbledore gryffindor slytherin 134 0.98

Beatles Beatles "John Lennon" "Paul McCartney" song 250 0.99

India-Pakistan Relations India Pakistan Kashmir 129 0.99US Financial crisis - TARP

tarp "financial crisis" "toxic assets" 146 0.93

German Chancellors

German chancellors "Angela Merkel" "Helmut Kohl" 124 0.91


Ad-Hoc Model Creation - Evaluation


Ad-Hoc Model Creation - Evaluation

Relative Recall

Recall wrt. possible extraction. I.e. the maximum number of extracted facts marks 100% recall


Related Work

Mintz

SOFIETurney

Structural

Open IE

Supervised

Distant Supervision

Coupled learner

Sur-face pat-terns only

Perti-nence for Semantic

simi-larity


Main Differences

• Surface-patterns only• Only positive training examples• Pertinence measure for semantic similarity• Concept-aware: start with defined concepts• Include background knowledge in

probabilistic classification instead of rule-based reasoning

59


Related work

• Pattern-based fact extraction– E. Agichtein and L. Gravano. Snowball: Extracting

relations from large plain-text collections. In JCDL, 2000.

– Suchanek, Fabian M., Mauro Sozio, and Gerhard Weikum. SOFIE : A Self-Organizing Framework for Information Extraction. WWW 2009.�

– T. M. Mitchell, J. Betteridge, A. Carlson, E. Hruschka, and R. Wang. Populating the Semantic Web by Macro-Reading Internet Text. ISWC 2009.

– M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts-step one: the one-million fact extraction challenge. In AAAI 2006.


Related work

• Relationship-pattern computations– P. D. Turney and P. Pantel. From Frequency to

Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 2010.

– P. D. Turney. Expressing implicit semantic relations without supervision. In ACL 2006


Summary Fact extraction

• Pattern-based fact extraction with generalization and Pertinence achieves competitive precision and recall while being computationally feasible for large-scale extraction– Pertinence computation can also be a

preprocessing step for other ML techniques• Different types of background knowledge

incorporated into one statistical framework– Combined Language model and Semantic

model


Application and Knowledge Validation

63

Scooner: Semantic browsing and retrieval – Evaluation in Use

Doozer++– Hierarchy extraction– Pattern-based fact

extraction

• 18 Million MedLine publications/abstracts

• UMLS Metathesaurus

• Wikipedia

Example: Domain model as a basis for research in the area of human cognitive performance.


Domain Definition – Extracted Hierarchy

A hierarchy extracted for a cognitive science domain model.

The keyword description given to the system was a collection of terms relevant to human performance and cognition.


Domain Description: Connect Concepts

65


Expert Evaluation of Facts in the Model

0.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9

Fractio

n

Score

Fraction in binCumulative incorrectCumulative correctCumulative interesting

7-9: Correct Information not commonly known

1-2: Information that is overall incorrect

3-4: Information that is somewhat correct

5-6: Correct general Information

66


Extractor Confidence vs. Correctness

• Analysis shows that highest quality extractions have the highest confidence, but also incorrectly extracted facts have high confidence

High-quality patterns as well as some noise-patterns have high indicative power.


Extractor Confidence vs. Correctness

• Many facts deemed interesting were extracted based on highly specialized patterns in the long tail of the frequency distribution.

• Noisy patterns also tend to occupy this space


Sources of Errors

• Extracted relationship too specific or formally incorrect but metaphorically correct.– <Interpeduncular_Cistern disease_has_associated_

anatomic_site Cerebral_peduncle> is incorrect, • Interpeduncular Cistern is not a disease. However, it does have

the associated anatomic site Cerebral peduncle.

• Incorrect directionality– <Pituitary_Gland sends_output_to Supraoptic_

nucleus> should be <Supraoptic_nucleus sends_ output_to Pituitary_Gland>• Direction in text often expressed in the context rather than the

immediate pattern

69


Validation

• Extracted statements need to be validated to be considered knowledge– Explicit validation, e.g. thumbs up/down– Implicit validation, e.g. by analyzing click streams

70


Explicit Validation

• Certainty of reference– I.e. we know exactly which statement was

validated

• Validator credentials can be obtained– E.g. a small community of experts may evaluate

• Extra work– Explicit validation is a task that is consciously

performed

71


Implicit Validation

• Find indications of correctness or incorrectness based on the way the users interact with the presented information– Every action taken on a piece of information is

recorded and analyzed– The cumulative behavior of the users gives an

indication of which propositions are correct or interesting

72


Implicit Validation

• Examples for implicit community-validation– Games with a purpose (L. von Ahn)– Google search rankings

• Scooner semantic browser– Browse literature along facts in a model– Browsing trails suggest correct extraction

73


Implicit Validation

• A fact is browsed very often by different users.– The fact is interesting to many users. – The fact is surprising and interesting, but may be incorrect.

• A user follows a trail of multiple fact-triples trough a variety of documents.– The facts that were browsed have a high probability of being correct and support is

added to the triples.– If the trail was longer than suggested by a small-world phenomenon, initial triples

may have been incorrect, but led to interesting ones. For this reason, only the last k triples of the trail should garner support or the support should increase for the last k triples in the trail.

– The last triple in the trail may have been incorrect and led to browsing results that caused the user to stop browsing. For this reason, the last triple of the trail should be treated with caution.

74


Validation “through use”

Enter search terms

Choose entity of interest

Browse extracted facts

Choose relevant literature that

supports the fact


Validation “through use”

Fact trails are recorded

Find another interesting fact


Validation “through use”Path suggests that at least the first 2 triples are factually correct


Browsed Facts Examples


Related work

• Evaluation and Use– E. Agichtein, E. Brill, and S. Dumais. Improving web

search ranking by incorporating user behavior information. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’06, page 19, 2006.

– A. Das, M. Datar, A. Garg, and S. Rajaram. Google News Personalization: Scalable Online Collaborative Filtering. In Proceedings of the 16th international conference on World Wide Web, page 280. ACM, 2007.


Summary Knowledge Acquisition

• The model actually reflects what the user is interested in at the point of creation Willingness to help validate facts– Applications allow for implicit and explicit

evaluation• Validated Statements can be merged with

existing knowledge Automated acquisition completed Individual-driven KA improved overall system

80

• R. Kavuluru, C. Thomas et al. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains. IHI 2012

• Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE IC, Nov./Dec. 2010• C. Thomas et al. Improving Linked Open Data through On-Demand Model Creation. Web Science Conference, 2010.• C. Thomas, et al.. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. WIC 2008.

http://knoesis.wright.edu/library/download/STM10-IC-Continuous-Semantics.pdf


Future Directions

• Active Learning to improve classification– Easy in tightly connected system (e.g. NELL)– Feedback mechanism for loosely connected

systems • Improve depth of classification

– Augment Domain Description with learned concept hierarchies from text (e.g. Navigli)

• Knowledge management for background knowledge– Belief updates– Model evolution

81


ContributionsConceptual Knowledge: Ontologies, LoD

Taxonomy extraction [WI1, WebSci, WebSem1]Event modeling [IEEE-IC]Relationship/Fact/Event extraction [IHI, WebSem1, IEEE-IC, WebSci]

Information Quality[WI2]Social processes for content creation [CHB]

Textual Information: Wikipedia, Web

Knowledge merging/Ontology alignment [AAAI, WebSem2, SWSWPC]

Social processes for knowledge validation [IHI,WebSci, CHB]

Knowledge Representation [IJSWIS, CR, FLSW]Ontology design [WWW, FOIS]

8282


Journal/Conference Publications

[WebSem] C. Thomas, P. Mehra, A. Sheth, W. Wang, G. Weikum. Automatic domain model creation using pattern-based fact extraction. Submitted to Journal of Web Semantics.

[IHI]R. Kavuluru, C. Thomas, A. Sheth, V. Chan, W. Wang, A. Smith, A. Sato and A. Walters. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains. IHI 2012 - 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012.

[IEEE-IC] Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE Internet Computing, vol. 14, no. 6, pp. 84-89, Nov./Dec. 2010, doi:10.1109/MIC.2010.137

[WebSci] C. Thomas, W. Wang, P. Mehra and A. Sheth. What Goes Around Comes Around Improving Linked Opend Data through On-Demand Model Creation. Web Science Conference, 2010.

[WI1] C. Thomas, P. Mehra, R. Brooks, and A. Sheth. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 1:496–502, 2008.

http://knoesis.wright.edu/library/download/STM10-IC-Continuous-Semantics.pdf


Journal/Conference Publications

[WI2] C. Thomas and A. Sheth. Semantic Convergence of Wikipedia Articles. In Proceedings of the 2007 IEEE/WIC International Conference on Web Intelligence, pages 600–606, Washington, DC, USA, November 2007. IEEE Computer Society.

[WWW] S. S. Sahoo, C. Thomas, A. Sheth, W. S. York, and S. Tartir. Knowledge Modeling and its Application in Life Sciences: A Tale of two Ontologies. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 317–326, New York, NY, USA, 2006. ACM Press.

[FOIS] C. Thomas, A. Sheth, and W. York. Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain. In Proceeding of the 2006 conference on Formal Ontology in Information Systems: Proceedings of the Fourth International Conference (FOIS 2006), pages 115–127, Amsterdam (NL), 2006. IOS Press.

[AAAI] P. Doshi and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. In AAAI’06: proceedings of the 21st national conference on Artificial intelligence, pages 1277–1282. AAAI Press, 2006.


Publications

[CHB] C. Thomas and A. Sheth. Web Wisdom - An Essay on How Web 2.0 and Semantic Web can foster a Global Knowledge Society. Computers in Human Behavior, Elsevier.

[WebSem2] P. Doshi, R. Kolli, and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. Web Semantics: Science, Services and Agents on the World Wide Web, 7(2):90–106, 2009.

[IJWGS] V. Kashyap, C. Ramakrishnan, C. Thomas, and A. Sheth. Taxaminer: an experimentation framework for automated taxonomy bootstrapping. International Journal of Web and Grid Services, 1(2):240–266, 2005.

[IJSWIS] A. P. Sheth, C. Ramakrishnan, and C. Thomas. Semantics for the semantic web: The implicit, the formal and the powerful. Int. J. Semantic Web Inf. Syst., 1(1):1–18, 2005.

[CR] S. Sahoo, C. Thomas, A. Sheth, C. Henson, and W. York. GLYDEan expressive XML standard for the representation of glycan structure. Carbohydrate research, 340(18):2802–2807, 2005.


Other Publications

Workshop Publications

[SWLS] A. Sheth, W. York, C. Thomas, M. Nagarajan, J. Miller, K. Kochut, S. Sahoo, and X. Yi. Semantic Web technology in support of Bioinformatics for Glycan Expression. In W3C Workshop on Semantic Web for Life Sciences, pages 27–28, 2004.

[SWSWPC] N. Oldham, C. Thomas, A. Sheth, and K. Verma. METEOR-S Web Service Annotation Framework with Machine Learning Classification. Semantic Web Services and Web Process Composition, pages 137–146, 2005, Springer.

Book Chapters

[FLSW] C. Thomas and A. Sheth. On the expressiveness of the languages for the semantic web - making a case for a little more. Fuzzy Logic and the Semantic Web, pages 3–20, 2006.

Patent

[PAT] P. Mehra, R. Brooks and C. Thomas. ONTOLOGY CREATION BY REFERENCE TO A KNOWLEDGE CORPUS. Pub.No. US 2010/0280989 A1


• Research– KR– Domain model

extraction / IE

• Collaborations– Complex Carbohydrate Research

Center at UGA

– HP Labs Palo Alto– Human Performance

Directorate, AFRL• Proposals

– HP Incubation & Innovation grant for Doozer++

– AFRL grant largely based on Doozer++

– NSF proposal submitted with “very good” reviews

• Tools and Ontologies– GlycO– GlycoViz– Doozer++– Scooner

87


Thank you!

Gerhard Weikum

Shaojun Wang

Pascal Hitzler

Pankaj Mehra

Amit Sheth

Thanks to all Kno.e.sis Center Members

–Past and Present


Thank you

89