[Lecture Notes in Computer Science] Advances in Artificial Intelligence Volume 6085 || Semantic Annotation Using Ontology and Bayesian Networks

A. Farzindar and V. Keselj (Eds.): Canadian AI 2010, LNAI 6085, pp. 416–418, 2010. © Springer-Verlag Berlin Heidelberg 2010

Semantic Annotation Using Ontology and Bayesian Networks

Quratulain Rajput

Faculty of Computer Science, Institute of Business Administration, Karachi, Pakistan

[email protected]

Abstract. The research presents a semantic annotation framework, named BNOSA. The framework uses ontology to conceptualize a problem domain and uses Bayesian networks to resolve conflicts and to predict missing data. Ex-periments have been conducted to analyze the performance of the presented semantic annotation framework on different problem domains. The sets of cor-puses used in the experiment belong to selling-purchasing websites where product information is entered by ordinary web users.

Keywords: Ontology, Bayesian Network, Semantic Annotation.

1 Introduction

A large amount of useful information over the web is available in unstructured or semi-structured format. This includes reports, scientific papers, reviews, product ad-vertisements, news, emails, Wikipedia, etc. Among this class of information sources, a significant percentage contains ungrammatical and incoherent contents where in-formation is presented as a collection of words without following any grammatical rules. Several efforts have been made to extract relevant information from such contents [1-4].

2 BNOSA Framework and Results

The proposed BNOSA (Bayesian Network and Ontology based Semantic Annotation) framework is capable of dynamically linking a domain-specific ontology and the corre-sponding BN (learnt separately) to annotate information extracted from the web. This dynamic linking capability makes it highly scalable and applicable to any problem do-main. The extraction of data is performed in two phases which is also depicted in Fig. 1.

Phase-I: To extract the information two issues need to be addressed: (a) finding the location of relevant data on a web page and (b) defining patterns for extracting such data. To solve the location problem, lists of context words are defined for each attribute of the extraction ontology. If a match is found, this suggests that the corresponding value should also be available in the neighborhood of this word. The rules are gener-ated on the basis on the attributes’ data-types.

Semantic Annotation Using Ontology and Bayesian Networks 417

Phase-II: To extract information from unstructured and incoherent data sources, one has to deal with variable size of information available at different website within a single domain. In some cases, context words are same for more than one attributes and the situation becomes more complicated when the relevant context words are not available in the text. BNOSA applies probabilistic reasoning techniques, commonly known as Bayesian Networks, to address these problems.

Fig. 1. Graphical representation of BNOSA Framework

To test the performance of the BNOSA framework, three case studies based on the selling/purchasing ads of used cars, laptops and cell phones were selected. Table 1 presents the precision and recall values at the end of Phase-I. The prediction accuracy of the missing values as a result of Phase-II is shown in Fig. 2.

Table 1. Performance of BNOSA using extraction ontology after Phase-I

Laptop Ads Cell Phone Ads Car AdsHard Disk

Display Size Price

Brand Name Speed Ram Memory Color Price Model

Mile age

Transmission Color Make Model Price Year

Precision%v 100 100 100 100 97 100 98.1 100 100 100 23.4 100 61.1 91.2 100 98 94

Recall% v 98 87 100 87 94 97.8 85.2 97.8 100 96.2 21.7 98.4 66.7 52.5 37.9 98 94

Precision%m 92 82 100 45 89 100 68 97 100 100 73.9 97.5 100 2.33 7.81 100 100

Recall%m 100 100 100 100 89 100 94.4 100 100 100 100 100 82.4 100 100 100 100

Fig. 2. Phase-II prediction results of attributes in three domains

418 Q. Rajput

References

1. Michelson, M., Knoblock, C.A.: Semantic annotation of unstructured and ungrammatical text. In: Proceedings of 19th International Joint Conference on Artificial Intelligence, pp. 1091–1098 (2005)

2. Ding, Y., Embley, D., Liddle, S.: Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 400–414. Springer, Heidelberg (2006)

3. Yildiz, B., Miksch, S.: OntoX - a method for ontology-driven information extraction. In: Gervasi, O., Gavrilova, M.L. (eds.) ICCSA 2007, Part III. LNCS, vol. 4707, pp. 660–673. Springer, Heidelberg (2007)

4. Rajput, Q.N., Haider, S.: Use of Bayesian Network in Information Extraction from Unstruc-tured Data Sources. International Journal of Information Technology 5(4), 207–213 (2009)

Documents

[Lecture Notes in Computer Science] Advances in Artificial Intelligence Volume 6085 || Semantic Annotation Using Ontology and Bayesian Networks