22
Group Members and Contact Information: Amar Viswanathan (Email: [email protected] ) Sabbir Rashid (Email: [email protected] , Phone Number: (508) 484- 3232) Lisheng Ren (Email: [email protected] , Phone Number: (518) 423-9925) Ian Gross (Email: [email protected] , Phone Number: (516) 765-1500) I. Use Case Description Use Case Name Knowledge Graph Evaluation System Use Case Identifier OE2017-KGCS-01.2 Source Amar Viswanathan Point of Contact Amar Viswanathan, [email protected] Creation / Revision Date Created 02/04/2017 / Revised 02/25/2017 Associated Documents Software documentation, links to evaluation II. Use Case Summary Goal Information Extraction (IE) toolkits using Entity, Event and Relationship Extraction procedures, extract information in the form of entities, events and relationships, respectively. Since these are independent tasks, the outputs are also disparate documents which are evaluated individually for the precision and recall metrics. While precision and recall metrics calculate the statistical accuracy, they miss out on finding obvious semantic errors. If these outputs are converted to a RDF based Knowledge Graph, it would be possible to use the added semantics to look for said errors. Thus, the goal of this project is to detect and evaluate inconsistencies or potential incorrect labels in the resulting Knowledge Graph by using a supporting Ontology to identify potential errors. Requirements In order to meet the requirements of our goal, we must develop a framework for Knowledge Graph (KG) evaluation. This will include an Ontology that defines the schema for creating KG triples from the disparate extractions. While the extraction framework doesn’t have to be created (an existing framework will be used, such as OLLIE or ODIN), the supporting ontology will have be engineered. The Ontology will be populated with terms from from the vocabulary of the individual extraction tasks, and

tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

  • Upload
    vohanh

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Group Members and Contact Information: ● Amar Viswanathan (Email: [email protected] ) ● Sabbir Rashid (Email: [email protected] , Phone Number: (508) 484-3232)● Lisheng Ren (Email: [email protected] , Phone Number: (518) 423-9925)● Ian Gross (Email: [email protected] , Phone Number: (516) 765-1500)

I. Use Case DescriptionUse Case Name Knowledge Graph Evaluation SystemUse Case Identifier OE2017-KGCS-01.2Source Amar ViswanathanPoint of Contact Amar Viswanathan, [email protected] / Revision Date Created 02/04/2017 / Revised 02/25/2017Associated Documents Software documentation, links to evaluation

II. Use Case SummaryGoal Information Extraction (IE) toolkits using Entity, Event and Relationship

Extraction procedures, extract information in the form of entities, events and relationships, respectively. Since these are independent tasks, the outputs are also disparate documents which are evaluated individually for the precision and recall metrics. While precision and recall metrics calculate the statistical accuracy, they miss out on finding obvious semantic errors. If these outputs are converted to a RDF based Knowledge Graph, it would be possible to use the added semantics to look for said errors. Thus, the goal of this project is to detect and evaluate inconsistencies or potential incorrect labels in the resulting Knowledge Graph by using a supporting Ontology to identify potential errors.

Requirements In order to meet the requirements of our goal, we must develop a framework for Knowledge Graph (KG) evaluation. This will include an Ontology that defines the schema for creating KG triples from the disparate extractions. While the extraction framework doesn’t have to be created (an existing framework will be used, such as OLLIE or ODIN), the supporting ontology will have be engineered. The Ontology will be populated with terms from from the vocabulary of the individual extraction tasks, and it will also include terms that provide the schema necessary for integrating the outputs.

Scope The scope of the system involves evaluating a Knowledge Graph that was generated from an IE toolkit for inconsistencies. The targeted audience for such a system would include those interested in evaluating the semantic capabilities of IE toolkit outputs. This may include end users, IE developers/evaluators or ontology engineers.

As different IE tools have outputs in different formats, the initial scope of the system includes to pick up one tool (in this case, ODIN), for a particular domain (in this case, Bio-Med) and then combine all the outputs produced by them to a meaningful representation, i.e. map them to standard terms or create semantics and build a RDF/RDFS Knowledge Graph. Detecting inconsistent labels would be a simple task in that graph, from which we would then be able to add axioms and rules that would help detect more 'semantic' errors that the IE tools did not detect. Furthermore, we can evaluate each of the entities extracted for completeness or determine if the instances are uniformly extracted.

Priority This would be developed and improved over the class timelineStakeholders Course Instructors and Ontology Engineers

Deborah McGuinness, 02/26/17,
for the next update version, please do it in review mode or highlight updates in yellow or something that makes it easy to find
Deborah McGuinness, 02/26/17,
course instructors are just graders not stakeholders
Deborah McGuinness, 02/26/17,
if and when we go to write this up for publication i think we want to expand to include knowledge graphs that may be partially manually generated or even may be partially curated by humans.
Deborah McGuinness, 02/26/17,
i modified this portion since i thought your use of derive was as in deductive closure that would be confusing to ontology literate people
Deborah McGuinness, 02/26/17,
dlm added - your rules may be heuristic or incomplete so in reality i think you will be identifying potential incorrect labels
Page 2: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Description In this use case we convert the outputs of REACH (Mihai Surdeanu’s Clulab) task, which is specified in the FRIES format. The extraction system performs IE on publications from PubMed and other biomedical domain related conferences. These outputs are present as events, entities, sentences, relationships, passages and contexts.

Evaluating a Knowledge Extraction system is generally done through the lens of precision, recall and the extended F1 measure. However, with these systems aiming to create Knowledge Graphs with accurate information, it is worth exploring measures of correctness of such Graphs. Since the output of these systems are rarely in any other form than XML documents, it is quite difficult for developers and users to figure out the semantic inadequacies of the IE system. Assuming a closed world assumption with a set of predefined rules for populating instances, we can first check if certain labels of entities are incorrect. Given a set of labels and classes, can we use a predefined set of rules to figure out if these labels and classes are assigned inconsistently.

Illustrative Example : An example rule could be that “CLINTON instanceOf PERSON” for a particular dataset. Now any “CLINTON” that is extracted as a “LOCATION” instance becomes a trigger for an anomaly. This kind of rule will be added as a disjointness axiom. The ontology can also be used to add other axioms to check boundary conditions, subtype relations etc. Our goal is to build such kind of axioms for the REACH outputs.

In addition to checking for correctness, such a knowledge graph can also give quick summaries of entities, relationships and the events. This is achieved by writing simple SPARQL queries ( select COUNT(*)...) and sending them to the designated triple store (Virtuoso)

Actors / Interfaces The primary actors of the service are the Evaluation Interface and End User. Secondary actors may include:

- Course Instructors- Students- IE Developers- Task Evaluators- Ontology Engineers- Data Source (Knowledge Graph)- IE Toolkit

Pre-conditions The presence of outputs from an IE toolkit. Furthermore, an ontology (created as part of the system) is required, which would be used to drive the evaluation. This would include terms extracted from the IE Output, as well as predetermined schema relations.

Post-conditions Inconsistencies found will be reported in a document and displayed by a visualization tool.

Triggers The user would log into the system, choose a knowledge graph and then look at which rules are violated. These violations will be reported as a histogram.

Performance Requirements

Java8 compliant systems, 8 GB of RAM for processing, 3.2 GB disk storage.

Assumptions Output Knowledge Graph Generated from IE toolkit contains errors or inconsistencies.

Open Issues Can inconsistencies be corrected automatically? How does one write rules for this?

Deborah McGuinness, 02/26/17,
i think you want to have the option of reporting in a few forms - not just a histogram
Deborah McGuinness, 02/26/17,
give an example form of the inconsistency you will report in your worked through example below. (if it is not already there)
Deborah McGuinness, 02/26/17,
you do not need to include us in the actors. you might include reviewers
Deborah McGuinness, 02/26/17,
really? you are going to have instance level content like clinton in your disjointness rules? i was expecting classes in the disjointness axioms
Deborah McGuinness, 02/26/17,
do some more work on this example for the next revision. it is fine if you want to have it in the domain of persons and locations or in biomedical example . later you will need a biomedical example
Deborah McGuinness, 02/26/17,
this will be impractical - but you dont have to have a complete set of rules - just a useful set of rules
Page 3: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

III. Usage Scenarios

Scenario 1: Mary

Mary is a knowledge graph evaluator from 3M. She has been working on a knowledge graph of their various products and assets. She would like to get a summary of how many entities of a particular class exists. Mary would submit her RDF graph to the application. The application would utilize its evaluation service to compare inconsistencies via the ontology. To work with her RDF graph, Mary has updated the ontology with entity relations for adhesive products. Once complete, Mary can issue a SPARQL query provided on the visualization tool. In this case, she wants to check for the product: 3M™ Optically Clear Adhesive 9483. The service indicates that there is an inconsistency in regards to the product form. It is listed as a roll, but another entity makes a connection to the 9483 product as a sticky note. The product may be used in the sticky note adhesive, but that’s not the form it takes.

Scenario 2: Mark

Mark is a graduate student at TWC and he works on the Jefferson Project. His team has created an ontology based on water quality, tools and facilities related to the project. The team has noticed a few odd connections in the ontology, so Mark has been tasked with figuring out the correctness of the ontology. He submits his RDF knowledge graph and makes some minor adjustments to the ontology to better align with the terms that the Jefferson Project relates to. The application provides inconsistency results. The portion he is interested in is the overall correctness of the knowledge graph. He looks at the generated histogram of accuracy for each provided term. Mark may notice anomalies of a particular class. He can then check the system to see which rules it is violating so that this can be corrected. The system also provides the general statistic of what percentage of the entire knowledge graph matches incorrect connections as described in the ontology.

Scenario 3: Jack

Jack is a QA engineer working on a medical recommendation system. He has “Reach output on the Open Access subset of PubMedi” as his data source. The QA system he built is not working well. He think it might due to some inconsistency in database. There are several rule he believes is true including “a drug is not a disease”. He inserted his knowledge about medical into the ontology of Graph Evaluation system. Once complete, he submit a SPARQL to the system to ask for the accuracy of the graph which he got 60% (where accuracy is defined as the amount of entity involving inconsistency / the amount of all entities). Then he submit a SPARQL asking about how much each rule contribute to that 40% percentage of inaccuracy and he found out that the rule “a drug is not a disease” contribute to 35% of inaccuracy. Therefore he find out the main problem of his database.

Scenario 4: Alex

Alex is a mobile APP developer, and he is working on a APP that is a QA system for medical system. He now need to decide which database he would like to use for his APP. He have “Reach output on the 1K papers from the summer 2015 Big Mechanism DARPA evaluation” and “Reach output on the Open Access subset of PubMed” as candidates data source. He want to know which database is more reliable. He believes that a drug could not be protein at same time. He inserted his knowledge about medical into the ontology of Graph Evaluation system. Once complete, Alex can submit a SPARQL query to find out the amount of inconsistencies in each RDF file. Then he find out that “Reach output on the Open Access subset of PubMedi” has less inconsistency and decided to use “Reach output on the Open Access subset of PubMedi” as his data source.

Deborah McGuinness, 02/26/17,
some combination between this usage scenario and the previous one seem most related to your content. we need to be more clear about what is expected in the set of rules. and also in the writeup (not necessarily in this scenario) we need to be clear about why one would have a better database than another. but also it needs to be stated what the difference between the database and the knowledge graph is and what is in the knowledgegraph.
Deborah McGuinness, 02/26/17,
what is an entity here? a class? a set of individuals? a set of all individuals, classes, and properties? other?
Deborah McGuinness, 02/26/17,
this needs more precision as well as improved English
Deborah McGuinness, 02/26/17,
what does this mean? is it that he has inserted a set of rules related to disjointness between classes such as the example given or something else?
Deborah McGuinness, 02/26/17,
how is the database connected to the knowledge graph and how is that related to the output of the ie system?
Deborah McGuinness, 02/26/17,
given your pubmed focus, this could be your first scenario but it needs more precision in the description.
Deborah McGuinness, 02/26/17,
what is in that kg?
Deborah McGuinness, 02/26/17,
what does this mean? does mark have a the equivalent of a data dictionary? a spreadsheet with data? other?
Deborah McGuinness, 02/26/17,
how are the adjustments determined?
Deborah McGuinness, 02/26/17,
what is in this knowledge graph? is it just the ontology or does it have data too?
Deborah McGuinness, 02/26/17,
you will want to have at least one usage scenario in your domain. This example is a little awkward - I am not sure how realistic it is that someone wants to know number of instances of a class - it is in this example, finding instances of rolls or sticky notes? and then finding that some instances like the 9483 is incorrectly classified as instances of both? this usage scenario should be revised to make it more clear
Page 4: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Scenario 5: Rachel

Rachel is a computer science student at Rensselaer Polytechnic Institute. She wants to use the Knowledge Graph Evaluation System to make sure a news document is using language in a correct manner. The text document is submitted to the application, which is interpreted by the IE. The Knowledge Graph Generator uses the three created XMLs to become an RDF knowledge graph. A comparison is made with the provided ontology. The output will provide the inconsistent labels, points to mappings of actual SIO terms, presents the count of terms that were mapped, and value/type mismatch and domain/range mismatch.

IV. Basic Flow of EventsNarrative: Often referred to as the primary scenario or course of events, the basic flow defines the process/data/work flow that would be followed if the use case were to follow its main plot from start to end. Error states or alternate states that might occur as a matter of course in fulfilling the use case should be included under Alternate Flow of Events, below. The basic flow should provide any reviewer a quick overview of how an implementation is intended to work. A summary paragraph should be included that provides such an overview (which can include lists, conversational analysis that captures stakeholder interview information, etc.), followed by more detail expressed via the table structure.

In cases where the user scenarios are sufficiently different from one another, it may be helpful to describe the flow for each scenario independently, and then merge them together in a composite flow.

Basic / Normal Flow of EventsStep Actor (Person) Actor (System) Description1 User Launches the application2 KnowledgeGraphApp,

IE, Knowledge Graph Generator

The provided text documents are converted to three XML documents via the IE service. The three XMLs are converted to an RDF via the Knowledge Graph Generator.

4 KnowledgeGraphApp, EvalutionService

The Evaluation Service compares inconsistency ontology with RDF graph to create results to display to User.

5 KnowledgeGraphApp, DisplayStats,Check for inconsistencies

Statistics and inconsistencies are provided to user.

6 User User may query for specific inconsistencies.

Subordinate Diagram #1 - Text Document to XML to RDF graphStep Actor (Person) Actor (System) Description1 User The user gathers a text document.2 User Launch Application4 User The user submits the text document to the

application5 KnowledgeGraphApp,

IEThe information extraction service splits the text document into three XML documents: Entity, Event and Relationship Extraction procedures.

6 KnowledgeGraphApp, The resulting documents are in the XML/JSON

Deborah McGuinness, 02/26/17,
what does it take to do that?
Deborah McGuinness, 02/26/17,
somewhere we need a description of what is in those 3 xml docs - where is that?
Deborah McGuinness, 02/26/17,
you have not spoken yet about this. this needs more description.
Deborah McGuinness, 02/26/17,
what is this? and how is it done?
Deborah McGuinness, 02/26/17,
you have not spoken yet about how you are getting those.
Deborah McGuinness, 02/26/17,
what are in those 3 created XML files and how do they combine into the RDF knowledge graph?
Page 5: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Knowledge Graph Generator

format. These documents are are converted to an RDF graph via the Knowledge Graph Generator. This will utilize a reference ontology.

Subordinate Diagram #2 - RDF EvaluationStep Actor (Person) Actor (System) Description1 KnowledgeGraphApp,

EvalutionServiceThe Evaluation Service is utilized to compare the inconsistency ontology and the previously generated RDF graph.

2 KnowledgeGraphApp Output is a visualization of inconsistencies.

Subordinate Diagram #3 - Inconsistency DisplayStep Actor (Person) Actor (System) Description1 KnowledgeGraphApp,

DisplayStats,Check for inconsistencies

Visualization is provided to the user. Retrieves Statistics for the extracted graph such as number of entities, relations, events, number of populated instances, percentage of correctness compared with baseline systems.

2 User The user drills down on specific statistics for instances of classes, applies existing rules to check for inconsistencies.

3 KnowledgeGraphApp Displays histogram of inconsistencies with accuracy percentages

V. Alternate Flow of EventsNarrative: The alternate flow defines the process/data/work flow that would be followed if the use case enters an error or alternate state from the basic flow defined, above. A summary paragraph should be included that provides an overview of each alternate flow, followed by more detail expressed via the table structure.

Alternate Flow of Events - Input ErrorStep Actor (Person) Actor (System) Description1 User Launches the application2 KnowledgeGraphApp Extracted graph is unextractable (Parsing Failure).

Reports error to user and application exits before DisplayStats and Check for inconsistencies runs.

VI. Use Case and Activity Diagram(s)

Provide the primary use case diagram, including actors, and a high-level activity diagram to show the flow of primary events that include/surround the use case. Subordinate diagrams that map the flow for each usage scenario should be included as appropriate

Deborah McGuinness, 02/26/17,
another alternative flow is that you get a curated knowledge graph from an external source.
Page 6: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Primary Use Case

Knowledge Graph Generation

Page 7: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Upload Knowledge Graph to Evaluation Service

Page 8: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Inconsistency Visualization Use Case

Activity Diagram

Page 9: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Design Diagrams

Information Extraction System

Page 10: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

Knowledge Graph Generator

Knowledge Graph Evaluation System

Page 11: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

VII. Competency Questions

1) Let an Example Rule be : ORGANIZATION disjoint with LOCATION. If there are locations that are extracted as organizations or vice-versa, how many instances of type LOCATION are also extracted as ORGANIZATION?

● Ans: The number of concepts extracted as both ORGANIZATION and LOCATION can be detected by the simple SPARQL query. A sample set of extracted text will be provided below, as well as a sample table that displays all objects returned as an LOCATION. All instances of terms extracted as LOCATION will eventually be viewable by a histogram of results.

Example Text: “Officials plan to enter a closed Standing Rock camp site near the Dakota Access Pipeline early Thursday following the arrest of 10 people after a deadline to leave the area expired.North Dakota Gov. Doug Burgum said the remaining 25 to 50 or protesters holding out in the Oceti Sakowin camp site will be allowed to leave without being arrested so contractors can continue cleaning up the protest site near the controversial 1,172-mile long pipeline. Those who refuse to leave will be arrested.”

Example Table Results: Query for all words extracted as LOCATIONTerm Extracted Tags InconsistenciesStanding Rock LOCATION, GPE, ... N/ADakota Access Pipeline

LOCATION, PROJECT, ...

N/A

North Dakota LOCATION, USA, ... N/AOceti Sakowin camp site

LOCATION, ORGANIZATION, ...

A LOCATION cannot be an ORGANIZATION.

2) Given a Knowledge Graph from PubMed documents, is it a consistent graph based on the rules defined in the supporting ontology.?

Deborah McGuinness, 02/26/17,
what is GPE as a tag?
Deborah McGuinness, 02/26/17,
will the histogram have drill down to get the list or can you provide a separate list view?
Deborah McGuinness, 02/26/17,
this is one class of example rules i was expecting (not the above that mentions a particular instance - clinton). we will need a description of the types of rules you plan to use.
Page 12: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

● Ans: If there are no violations of any instances to the basic rules, then the system would report a consistent graph. For example, if there are no multiple labels for the disjoint classes, then the graph is a consistent graph. A basic set of rules that may define consistency are disjointness between entities, disjointness between relations, misclassification of events/processes, misclassification of relations and misclassification of entities.

● Ontology Example: Given the PubMed documents, the IE classifies the information and it’s compared to the supporting ontology. In this case, let’s say that it finds a total of 40 inconsistencies. The method of how these inconsistencies are classified are not important for this case and will be covered in the other competency questions. The inconsistencies found will be displayed to the user, who can find out why they were considered inconsistent. If the IE found 500 terms, the document would come back as 8% inconsistent.

3) What conditions are required for a Misclassification of an Entity to be detected by the system?● Ans: If the Supporting Ontology contains the correct or expected classification of the term in

question, then this misclassification may be detected.● Ontology Example: Cotinine is misclassified as an Environmental Phenol, although it is actually a

Tobacco Metabolite. If the supporting ontology included a hierarchy tree of analyte relations, then the Knowledge Graph Evaluator could determine that a misclassification has occurred.

Example Table Results: Query extraction of PHENOLSTerm Extracted Tags InconsistenciesBenzophenone-3 PHENOL N/A6-Tetrachlorophenol

PHENOL N/A

2-Tetrachlorophenol

PHENOL N/A

Cotinine PHENOL Actually a TOBACCO METABOLITE3-butadiene mercapturic acid

PHENOL Actually a TOBACCO METABOLITE

4) Let the rule in ontology be if there is a relation “catalysis accelerate” between two objects classified as “catalyst” and “chemical reaction”, then there should not be relation “slow down” between two objects. In other words, the two relation are disjoint with each other. The question is “Is all the relations extracted consistent with this rule?”

● Ans: If the relations extracted contains such inconsistent pair of relations, it will offer an visualization of the inconsistent disjoint relations just like the picture in next question.

● Ontology Example:

Example Table Results: Query extraction of CATALYSTNo. Term

Extracte

Tags Term Extracted

Tags Relation Inconsistencies

1 MnO2 CATALYST H2O2 decomposition

chemical reaction

catalysis accelerate

Inconsistent with relation No.2

2 MnO2 CATALYST H2O2 decomposition

chemical reaction

slows down Inconsistent with relation No.1

5) Given the results of the Knowledge Graph Evaluation System, what would the inconsistent graph look like?

Deborah McGuinness, 02/26/17,
is this a question a user asks to the system? this again seems like a question you ask to the system designer
Deborah McGuinness, 02/26/17,
ok -work this out more - what is the owl used or just the first order logic used to represent this? use this (or something like it) for your next refinement pass
Deborah McGuinness, 02/26/17,
work this one out in a bit more detail - what are the rules used in this case? and give an example of a small portion of the analyte relation hierarchy that you are using. Note you mention environmental phenol - presumably a class that is a subclass of phenol. you do not say how that relates to metabolite or tobacco metabolite nor the relations you are using.
Deborah McGuinness, 02/26/17,
give at least a few example rules - and make sure to give examples of the different types of rules you mention above. Please make sure to do this for the next assignment refinement
Page 13: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

● Ans: After information has been processed, all inconsistencies are stored in a triple store or database. A visualization would show the overall knowledge graph, with the inconsistencies marked by X’s. This diagram showcases the supporting ontology with the incorrect matches that came up with being processed by the Knowledge Graph Evaluation System.

6) Consider the case where an Entity is not already defined in a Supporting Ontology. If one source classified an Entity a certain way which is contradictory to a different source, how do you know which source is more likely to be correct?

● Ans: The system would classify this scenario as either consistent or inconsistent. If the system detects this as an inconsistency, would it fall under the category of a disjoint or a mismatch, depending on the given scenario?

● Ontology Example: Of two corpuses that present the gene sequence for Brown Eyes, one defines the sequence as AAAGTTCCTTT while the other insists that it is AAGGTTCCTTC . The correct gene sequence for brown eyes is not already defined in a supporting ontology.

i) Scenario 1: The system knows that brown eyes are supposed to have a gene sequence, but doesn’t know which one is right.

○ The KGES system would classify this scenario as a disjointness inconsistency. Although brown eyes are supposed to have an associated gene sequence, it doesn’t know which is right. One could be right and the other could be wrong, they could both be wrong or they could both be right. Because connectivity is not defined, this classifies as an inconsistency.

ii) Scenario 2: The KGES defines that brown eyes aren’t supposed to have a gene sequence, but the data makes this connection (this case is unrelated to the above example, but if useful to address this scenario).

○ The KGES system marks this as a mismatch inconsistency, because brown eyes are classified as having no associated gene sequence.

7) How will The KGES detect disjointness of relations inconsistencies?● Ans: It will use disjointness rules defined in a Supporting Ontology to determine if an

inconsistency had occurred.● Ontology Example: Consider two documents that define the relationship between Barack and

Michelle Obama.

Deborah McGuinness, 02/26/17,
this again is a question to the system designer. a competency question is a question you ask to the system. You can however modify this to a question you are asking of the system of finding instances that are related by two relationships that should not exist simultaneously. you can use your example as one of the things you are modeling in the next homework. although since you are working with the pubmed corpus you might try to focus your conceptual modeling around things you will use there.
Deborah McGuinness, 02/26/17,
how are you thinking of modeling this in the ontology? try to use this example in the next homework.
Deborah McGuinness, 02/26/17,
try to work this in more detail. what is the inconsistency - is it that 2 are there and there should be at most 1 ? or might it be possible (according to the background knowledge ontology that there could be 2) or something else?
Deborah McGuinness, 02/26/17,
is there a property called something like "hasGeneSequence" and does that have a cardinality restriction so that there is only one? or something else? - i think you can work with this example but it needs more clarity
Deborah McGuinness, 02/26/17,
what is the class in the ontology? what is the instance in the ontology?
Deborah McGuinness, 02/26/17,
do you mean that an instance is said to be both for example a location and an organization - thus labeled with 2 disjoint classes? or something else?
Deborah McGuinness, 02/26/17,
is this a class, property, or instance?
Deborah McGuinness, 02/26/17,
this picture is interesting and possibly very useful. are you saying that were we see a blue x the bottom node should NOT be a subclass of the node above it? what are the links between the nodes? are they all subclass links or other? and are the nodes that have nodes below them all classes? are the leaf nodes instances? or are all the nodes instances? and then a node with a red x is an instance that has the wrong class label? and the blue x on a link what precisely does that mean? give a concrete example - put in labels on the nodes and links. this *might* be a very nice way to display things - i am just trying to make you get more precise
Page 14: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

One defines the relation: BARACK_OBAMA hasWife MICHELLE_OBAMAThe other defines the relation: BARACK_OBAMA hasDaughter MICHELLE_OBAMA

If the Supporting Ontology includes the disjointness rule: hasDaughter disjoint hasWife , since (typically) a person’s wife is not also his daughter, then The KGES should detect the inconsistency.

● Ans: This scenario depicts a situation where a term is classified incorrectly. This would be considered a role mismatch inconsistency because the term does not match the expected definition.

● Ontology Example: The term, Apple, is defined in the supporting ontology. Take the following example of text input:

i) “I ate an apple for lunch.”● The IE interprets the the word, apple, as an ORGANIZATION. However, the word should be

tagged as a FRUIT. It matches the case of the term being edible, yet an ORGANIZATION cannot be edible. If a term is classified incorrectly in regards to the sub-relations, it indicates that the original tag must be incorrect.

8) What are the terms involved with significant more inconsistencies than others?● Ans: After the KGES complete evaluation process, it make a table for how much inconsistent each

term has involved with. The term with statistical significant more inconsistency are considered troublesome terms and can be acquired by user.

● Ontology Example: Given that each term are involved with following number of inconsistency. Cotinine and 3-butadiene mercapturic acid will be considered as inconsistency troublesome terms.

Example Table Results: Query extraction of PHENOLSTerm Extracted InconsistenciesBenzophenone-3 26-Tetrachlorophenol 32-Tetrachlorophenol 5Cotinine 303-butadiene mercapturic acid 35

9) Give the knowledge graph for the XML files given by user. ● Ans: The Knowledge Graph Generator will generate a compliant error free RDF files for the input

XML files.

VIII. Resources

In order to support the capabilities described in this Use Case, a set of resources must be available and/or configured. These resources include the set of actors listed above, with additional detail, and any other ancillary systems, sensors, or services that are relevant to the problem/use case.

Knowledge Bases, Repositories, or other Data SourcesData Type Characteristics Description Owner Source Access

Policies & Usage

REACH output Json, nxml e.g. – large Dataset is the Mihai Mihai Academic

Deborah McGuinness, 02/26/17,
what does it mean "how much inconsistent each term is involved with"?
Deborah McGuinness, 02/26/17,
this looks like it is all modeled as classes - not roles .please be more precise if you are going to use this as one of your competency questions.
Page 15: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

in FRIES format on 1K

extracted dataset of xml documents. The input sources is PubMed

output of clulab’s information extraction system on PubMed documents This system extracts entities, events and relationships according to the FRIES format

Surdeanu Surdeanu, Tom Hicks

Usage

External Ontologies, Vocabularies, or other Model ServicesResource Language Description Owner Source Describes/Uses Access

Policies & Usage

Uniprot RDFS Protein sequence and functional information

http://www.uniprot.org/core/

Used to link definitions of protein sequences that are extracted by the IE system.

Free and Open Source. Academic Use

SIO RDFS SIO is an integrated ontology that describes all kinds of objects, processes, attributes for the bio medical sciences

Michel Dumontier

https://github.com/micheldumontier/semanticscience

Free and Open Source

Other Resources, Service, or Triggers (e.g., event notification services, application services, etc.)

Resource Type Description Owner Source Access Policies &

UsageVirtuoso Triple

StoreVirtuoso will be the triple store of choice to store the RDF Graphs

For now we shall be hosting it on zen.cs.rpi.edu. But we can as well use any tetherless

https://virtuoso.openlinksw.com/

Free and open source version

Page 16: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

world server

IX. References and Bibliography

List all reference documents – policy documents, regulations, standards, de-facto standards, glossaries, dictionaries and thesauri, taxonomies, and any other reference materials considered relevant to the use case

FRIES-output-spec-v0.10_160502:: https://drive.google.com/drive/folders/0B_wQciWI5yE6RWFuSWlCWG8tQWM

1. McGuinness, D. L., Fikes, R., Rice, J., & Wilder, S. (2000, April). An environment for merging and testing large ontologies. In KR (pp. 483-493).

2. Guo, M., Liu, Y., Li, J., Li, H., & Xu, B. (2014, May). A knowledge based approach for tackling mislabeled multi-class big social data. In European Semantic Web Conference (pp. 349-363). Springer International Publishing.

3. McGuinness, D. L., Fikes, R., Rice, J., & Wilder, S. (2000). The chimaera ontology environment. AAAI/IAAI, 2000, 1123-1124.

4. Brank, J., Grobelnik, M., & Mladenic, D. (2005, October). A survey of ontology evaluation techniques. In Proceedings of the conference on data mining and data warehouses (SiKDD 2005) (pp. 166-170).

5. Pujara, J., Miao, H., Getoor, L., & Cohen, W. W. (2013). Knowledge graph identification.

6. Hlomani, H., & Stacey, D. (2014). Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: A survey. Semantic Web Journal, 1-5.

7. Gómez-Péreza, A. OOPS!(OntOlogy Pitfall Scanner!): supporting ontology evaluation on-line.

8. Valenzuela-Escárcega, Marco A., Gus Hahn-Powell, and Mihai Surdeanu. "Description of the odin event extraction framework and rule language." arXiv preprint arXiv:1509.07513 (2015).

9. Dumontier, Michel, et al. "The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery." Journal of biomedical semantics 5.1 (2014): 14

X. Notes

Deborah McGuinness, 02/26/17,
you have some additional refs to add right - at least some from jiao tao. I am not sure if you were including others since last week.
Page 17: tw.rpi.edutw.rpi.edu/media/latest/OE_4_Update_KGES_UseCase.docx · Web viewThe targeted audience for such a system would include those interested in evaluating the semantic capabilities

There is always some piece of information that is required that has no other place to go. This is the place for that information.