Upload
melanie-gibson
View
213
Download
0
Embed Size (px)
Citation preview
인공지능 특강 프로젝트- Development of Decision Tree Algorithm for Semantic Web data -
2010313148
전동규
2
Agenda
1. Project Purpose
2. Motivation
3. Related Work
4. Algorithm
5. Description of Problem
3
Relational Data- Semantic Web Data (Linked data) -
Decision Tree Algorithm- C4.5 Algorithm -
Semantic Decision Tree
1. Project Purpose
4
1. Project Purpose
• Goal of project
Development of new kind of Decision Tree algorithm which
supports decision making based on Semantic Web environmental
information
Solve the several problems which is already solved by other
related researches
• Data : Linked Data(Semantic Web ontology)
5
1. Project Purpose
• Ontology
Class : Definition of Set
Property : Relations between instances
Instance : Individuals which are belonged in classes
Schema of Example Ontology
Datatype Property Object Property
Class
Range type
Boolean
IntString
String
Rich
name
working_a
t
hasParent
Person
Workplace Location
hasChil
d PersonDoctor
Teacher
Student
Person
Hospital
School
age
String
name
6
• Definition of Semantic Web
The Semantic Web is an evolving development of the World Wide Web in which
the meaning (semantics) of information and services on the web is defined,
making it possible for the web to understand and satisfy the requests of people
and machines to use the web content
Semantic Web is based on ontologies which corresponds to Semantic Web data
• Linked Data
The term Linked Data is used to describe a method of exposing, sharing, and
connecting data on the Web
What is Semantic Web ?
[1] Berners-Lee, T. (2001), “ The Semantic Web,” Scientific American, Vol. 501.
2. Motivation
7
• Increase of Semantic Web Data
Appearance of Semantic Web Document Search Engines
Falcons : Twenty millions over RDF/XML Documents
Swoogle : Three millions over Semantic Web Documents
Open data in Semantic Web
LINKINGOPENDATA :
The goal of the W3C SWEO Linking Open Data community project is
to extend the Web with a data commons by publishing various open
data sets as RDF on the Web and by setting RDF links between data
items from different data sources
The data sets consist of over 4.7 billion RDF triples, which are
interlinked by around 142 million RDF links (May 2009).
Development status of Semantic Web
Date RDF/XML Quadruple
Falcons2009-09-02 21,639,337 2,936,868,638
2009-05-29 19,919,364 2,177,084,709
Date Semantic Web Document Triple
Swoogle 2009-10-17 3,109,616 1,065,799,526
2. Motivation
8
• Increase of Semantic Web Data
Semantic Web based Portal Site
Twine :
Twine is a Semantic Web Portal that making networks of information
based on user’s posts which consist of their own information and
favorite things.
Every information composing Twine is written in RDF and OWL
format.
Twine have millions of visitors in a month, and they have over
millions of relationships between 3 millions of semantic tags (March
2009)
The necessity of mining useful knowledge from huge size ontology is highly expected. Therefore, Data Mining methodology for Semantic Web should be ready for this necessity.
Development status of Semantic Web
2. Motivation
9
• Traditional Decision Tree algorithm is impossible to apply in
Semantic Web
Semantic Web based Ontology has special characteristics for mining
Since Semantic Web document has network structure, multi-value issue
is occurred
Traditional Decision Tree just uses value of variables. Therefore,
additional information of Semantic Web are can not be applied
Converting Semantic Web data into single table style that used to use in
traditional decision tree algorithm is impossible
Decision Tree in Semantic Web
2. Motivation
10
• Arno J.Knobbe[2] developed decision tree algorithm for Multi-relational database
Selection Graph is suggested to do decision tree on RDB
Selection Graph is composed of Node, Edge, and condition and it can be
expressed in SQL syntax
3. Related Work
This research suggested partial solution about multi-value issue which also happened in Semantic Web ontology. However, this methodology can not be applied to Semantic Web which contains a lot of information than RDB
• David Jensen[3] suggested methodology that converting social network data to single table data which can be applied to Traditional Decision Tree algorithm
‘QGraph' that kind of query language to get the local network from entire social
network is suggested
QGraph is composed of Node, Edge, and condition and it can query many
objects at once
Since ontology information are manually converted to single table form, missing information will be occurred a lot
[2] Arno J. Knobbe., Arno Siebes., Danil Van Der Wallen., Syllogic B. V. (1999). “Multi-Relational Decision Tree Induction,” In Proceedings of PKDD’99, [3] D. Jensen., and J. Neville.(to appear) (2002). “Data mining in networks,” Papers of the Symposium on Dynamic Social Network Modeling and Analysis.
11
4. Algorithm
• Search procedure of algorithm follows C4.5 algorithm
• New methodologies are required to learn concepts in ontology
• ‘Constructor’ can be used as similar as attributes in traditional
Decision tree
• Related works used the terms ‘Refinement’ as an attribute in
Decision Tree
12
4. Algorithm
• What is a Refinement?
Refinement is a condition for split branches in decision tree. In this algorithm,
property and class from ontology are used as a refinement.
When define a refinement, Role Constructors from Description Logic are applied
to make the best use of information in Semantic Web
• Type of Refinements
Concept Constructor Refinement : Applying type information of instances
Cardinality restriction Refinement : Applying cardinality information on
object property
Domain restriction Refinement : Applying value of datatype property
Qualification Refinement : Applying information of quantification restrictions
and range class of object property
Refinements
13
Refinements
Example
Concept Constructor Refinement
Hospital
not Human
Domain restriction Refinement
Age.(≥ 21)
Cardinality restriction
Refinement ≥ 3 hasChild
Qualification Refinement
hasChild.Blond
4. Algorithm
• Developed Refinements
14
• The list of syntax information which can be expressed in ontology
Language Syntax
RDF rdf:type
RDFS
rdfs:domain
rdfs:range
rdfs:subClassOf
rdfs:subPropertyOf
OWL
owl:AllDifferent
owl:allValuesFrom
owl:cardinality
owl:Class
owl:complementOf
owl:DatatypeProperty
owl:DataRange
owl:differentFrom
owl:disjointWith
owl:equivalentClass
Language Syntax
OWL
owl:equivalentProperty
owl:FunctionalProperty
owl:hasValue
owl:intersectionOf
owl:InverseFunctionalProperty
owl:inverseOf
owl:masCardinality
owl:minCardinality
owl:ObjectProperty
owl:oneOf
owl:sameAs
owl:someValuesFrom
owl:SymmetricProperty
owl:TransitiveProperty
owl:unionOf
4. Algorithm
15
5. Description of Problem
Train problem
Ten trains
• After learning, found definition of eastbound train is as follows
16
5. Description of Problem
• Artificial task of learning to predict whether a train is headed
east or west
• Data is consist of relation tuples
• Relations eastbound(T) : train T is eastbound
has-car(T,C) : C is a car of T
infront(C,D) : car C is in front of D
long(C) : car C is long
open-rectangle(C) : car C is shaped as an open rectangle similar relations for
five other shapes
jagged-top(C) : C has a jagged top
sloping-top(C) : C has a sloping top
open-top(C) : C is open
contains-load(C,L) : C contains load L
1-item(C) : C has one load item similar relations for two and three load items
2-wheels(C) : C has two wheels
3-wheels(C) : C has three wheels
Train problem