Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
140
CHAPTER 6
EXPERIMENTAL EVALUATION
6.1 EXPERIMENTAL SETUP
In this chapter, the dataset used for experimental setup is collected
from Folksonomy oriented bookmarking sites. Experiments are conducted to
find the relevant tags to provide an effective tag recommendation for the
resource. For efficient tag recommendation, Folksonomy dataset collection
has been taken into consideration based on the blog contents. Common
metrics such as precision, recall and F-measure have been discussed briefly
and the results of experiments are presented. For each testing data, the tags
are extracted from blogs using keyword extraction method and interest scores
for keywords to set up the input for recommendation has been computed. The
datasets such as BibSonomy and Delicious are examples of extracted tags
from the blogs.
6.2 DATASETS
The training data set consists of tags, titles, category from
Wikipedia and semantic relationship from WordNet. The main objective of
the training phase is to construct topic ontology related to tags. Existing blog
tags are used as a test data set. To evaluate the proposed recommendation
approach, datasets have been chosen from two different Folksonomy systems
namely Delicious and BibSonomy. BibSonomy and Delicious are the popular
social networking systems which have been applied to the research work on a
141
wide scale by providing various services that have been producing interesting
recommendations to the community. These services permit users to convey
their thoughts on resources with their own words.
6.2.1 Delicious
Delicious datasets have been used for a limited period of time in
which user can build bookmark to the URL and share with others. Delicious is
one of the popular collaborative tagging sites for bookmarking which permit
users to tag blog and web pages on the web. Figure 6.1 shows a snapshot of
Delicious.
Figure 6.1 Snapshot of Delicious
142
Table 6.1 Most Frequent Domains in the Delicious Corpus
S. No. Domain Bookmarks Users
1 en.wikipedia.org 937,785 305,739
2 www.flickr.com 892,157 262,963
3 www.youtube.com 890,769 256,126
4 www.google.com 772,460 176,890
5 www.nytimes.com 613, 676 121, 575
6 www.amazon.com 541, 314 94,093
7 news.bbc.co.uk 416,878 85,910
8 lifehacker.com 369,078 80,728
9 community.livejournal.com 320, 021 39,755
10 www.microsoft.com 310, 701 131,847
Table 6.1 shows the most frequent popular URLs in the delicious
corpus. Tags can be added to the users’ bookmark to explain, search, share
and classifying the bookmarks. Most recent bookmarks and its corresponding
tags are shown in Delicious’ front page statistically. Delicious has a popular
page to demonstrate the same information for most popular URLs. A set of 10
tags have been considered for this research work. For these tags, 23,701
URLs are retrieved. The tags which occur with a larger frequency and most
popular tags have been obtained. Finally, 2, 01,711 tags are retrieved with 89
% of tags per URL. It is easy to find the relevant topics since many users tag
the content. Table 6.2 shows the most frequent domains in the delicious
corpus. Figure 6.2 shows the popular URLs in delicious corpus.
143
Table 6.2 TOP 10 Popular URLs in the Delicious Corpus
S. No. URL Bookmarks
1 www.flickr.com 35, 732
2 www.pandora.com 35,531
3 script.aculo.us 31, 643
4 www.netvibes.com 30, 782
5 en.wikipedia.org 27, 672
6 www.youtube.com 26, 183
7 slashdot.org 25, 630
8 www.last.fm 23, 957
9 oswd.org 21,530
10 www.alvit.de 21, 130
Figure 6.2 Popular URLs in the Delicious Corpus
144
6.2.2 BibSonomy
BibSonomy is possibly the best investigated Folksonomy to date in
which user can accumulate and interpret URLs and publications as well.
Bibsonomy dataset is employed for tag recommendation challenge. Users,
resources, tags or keywords are considered as datasets. Other additional data
have been disregarded or ignored for all practical purposes. A set of 10 tags
has been chosen randomly from the tag list. Bookmark content has been
received for each tag with respect to relevant tags. Figure 6.3 shows the
snapshot of BibSonomy.
Figure 6.3 Snapshot of BibSonomy
145
6.3 CHOICE OF LANGUAGE FOR IMPLEMENTATION
6.3.1 Java
Java is a simple, portable, object-oriented, distributed, secure,
interpreted, robust, architecture neutral, multithreaded and dynamic
programming language. Java has significant advantages over other languages
and environments that make it suitable for programming task. Java becomes a
language of choice to implement the concepts for providing worldwide
internet solutions.
IDE used is Net Beans 6.0. Initially, front end is designed with
Macromedia Dreamweaver 8 tool. The process of article and relation
extractions is performed using JSP and Core Java.
6.3.2 MS-Access
MS-Access has been used to create multiple relational tables and
store more data. MS-Access allows the user to create relationships between
similar field across different tables or queries.
MS-Access is used as a back end for storing and retrieval process.
User details are saved for authentication purpose and it can be changed
dynamically each time when a new user enters. Keywords from existing blogs
are extracted and applied interest scores for the process of recommendation.
These scores are updated dynamically whenever a new keyword occurs.
Based on the highly activated scores, tags are suggested and represent it
graphically using MATLAB 7.5.0.342 (R2007b).
146
6.3.3 MATLAB (Matrix Laboratory)
With huge quantities of information circulating round the web,
various samples of data set needed to be considered for effective tag
recommendation. MATLAB is an ideal simulator tool that could be used for
applications with custom graphical interfaces. In this approach, MATLAB
environment allows writing of programs using JAVA and develop algorithms
and applications to evaluate the performance.
6.4 PERFORMANCE EVALUATION METRICS
Performance of the tag recommendation is based on the following
standard metrics.
6.4.1 Precision
In Information Retrieval (IR), Precision is the portion of retrieved
instances that are relevant and measure the quality of the recommended tags.
Precision =relevant tags retrieved tags
retrieved tags (6.1)
6.4.2 Recall
In IR, Recall is the portion of relevant instances that are retrieved
and measure the completeness of the recommended tags.
Recall =relevant tags retrieved tags
relevant tags (6.2)
147
6.4.3 F-Measure
F-measure combines recall and precision into one measure and is
defined as
2* Pr ecision * RecallF MeasurePr ecision Recall
(6.3)
It is also called F1 measure, because precision and recall are
weighted equally.
6.5 EVALUATION AND COMPARISION
This approach is validated using data from Delicious and
BibSonomy. A sample set of 50000 blogs has been taken into consideration.
Out of this set, a set of 50 blogs has been set aside for testing purposes.
Training set is used to build topic ontology in order to recommend tags for
resource in the test set. The increase of interest for the tags in a test set is
computed. In this approach, Precision, Recall and F-Measure are computed to
evaluate the performance effectiveness. Greater the precision, more precise
the suggested tags are. The most probable user will use the suggested tags
with recall. Not all the tags in test set are recommended. Experimental results
demonstrate the efficient tag recommendation based on weight of the tags
(interest scores which is assigned on the tags) and semantic relationship in
topic ontology. It retrieves the high scored tags when tags are related to the
users and scores are updated each time a new tag appears. Figure 6.3
represents the interest scores for number of tags in a blog. It is evident from
the figure 6.4 that the tags used by a large number of users with increase in
interest score have been identified.
148
1 2 3 4 5 6 7 8 9 1010
20
30
40
50
60
70
80
90
100
Number of sample tags
Interest score
Figure 6.4 Interest Score for the Tags
The following results are retrieved from the test set. The precision
and recall for the recommendation results for both Delicious and Bibsonomy
datasets have been obtained. The tags are then taken from the
recommendation list and used to suggest user interest for a particular concept.
Topic ontological tags are initialized to one with interest scores. Such a
recommendation set represents a condition where no initial user interest is
available. Spreading activation algorithm is applied to update interest scores
once topic ontology is constructed and the precision and recall values are
calculated with recommended results in order to compare with existing
AutoTag approach. The upgrading of interest scores for tags, illustrated in
Figure 6.4 has been calculated in percentage. Recall is the process of giving
input data into a trained set and receiving the response. Table 6.3 shows the
149
precision and recall calculation for 10 tags of Bibsonomy and Delicious
datasets. This work illustrates the idea clearly that the dataset outperforms the
proposed topic ontology for tag recommendation. When an user posts a
bookmark to a system, it recommends the right set of tags to the users.
Table 6.3 Precision, Recall for both BibSonomy and Delicious Datasets
Number
of Tags
Precision Recall
BibSonomy Delicious BibSonomy Delicious
1
2
3
4
5
6
7
8
9
10
0.20
0.263
0.301
0.331
0.361
0.382
0.418
0.438
0.463
0.482
0.21
0.265
0.310
0.368
0.393
0.421
0.440
0.463
0.481
0.492
0.124
0.163
0.217
0.261
0.310
0.347
0.385
0.409
0.428
0.449
0.128
0.182
0.228
0.268
0.298
0.319
0.337
0.352
0.370
0.393
6.5.1 Comparison
The tag recommendation approach is compared with existing
AutoTag mechanism after evaluating the performance of the proposed
approach based on the collected data from two different Folksonomy systems.
Folksonomy is a social and decentralized approach that is formed by
150
individuals or groups. Existing AutoTag mechanism does not recommend
newly added tags when it is used already in a blog.
Table 6.4 Precision, Recall and F-measure for BibSonomy Datasets
Table 6.4 shows the precision, recall and F-measure for the
BibSonomy datasets for both existing AutoTag Mechanism and proposed
Topic Ontology with Spreading Activation Algorithm.
Precision is a percentage of correctly recommended tags among all
tags recommended by the algorithm. The proposed method does not explicitly
focus on frequently used tags, which creates a potential area of improvement.
If the system failed to recommend frequent tags with high accuracy, its results
could be combined with the results of a system that focuses explicitly on these
tags. To test if such extension is needed, the results of the system are re-
evaluated by considering the top N [1, 10000] tags, sorted by the frequency
of occurrence in all posts. Posts which contained no tags from the set of the
151
most frequent tags were removed from the evaluation process. It is important
to notice that, it does not prune the list of recommended tags by removing the
low frequency tags. Although, it would certainly improve the accuracy of the
system, it would defeat one of the purposes of the experiment, which was to
determine if the system needs an additional module to increase the rank of
frequently used tags among all recommended tags.
The results of the experiment show that the system achieves much
higher precision score considering the most frequent tags only, comparing to
the results of the system evaluated for all tags. In most cases the largest
improvement is noticed for the top few tags. The accuracy of recommendation
decreases with the increasing size of the most frequent tags set, which is an
expected behavior, given that less frequent tags would become harder to
recommend. The same pattern can be observed for user relevant tags, which
show that the spreading activation is not impairing the quality of
recommendation for high frequency tags.
1 2 3 4 5 6 7 8 9 100.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of recommended tags
Bibsonomy
Topic ontology with SAAutoTag mechanism
Figure 6.5 Precision for Bibsonomy Datasets
152
Figure 6.5, 6.6 and 6.7 shows the precision, recall, F-measure for
the BibSonomy datasets respectively. It increases gradually according to the
increasing number of more tags for recommendation used in Bibsonomy
datasets. Here the proposed Algorithm achieves better performance than the
existing AutoTag mechanism on the tags whereas it is more tedious to hit the
resources specific to the most popular tags.
1 2 3 4 5 6 7 8 9 100.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of recommended tags
Bibsonomy
Topic ontology with SAAutoTag mechanism
Figure 6.6 Recall for BibSonomy Datasets
1 2 3 4 5 6 7 8 9 100.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Recommended Tags
F-Measure for Bibsonomy Datasets
AutoTag MechanismTopic Ontology with SA
Figure 6.7 F-Measure for Bibsonomy Datasets
153
Table 6.5 Precision, Recall and F-Measure for Delicious Datasets
Table 6.5 shows the Precision, Recall and F-Measure for the
Delicious datasets for both existing AutoTag Mechanism and proposed Topic
Ontology with Spreading Activation Algorithm.
154
1 2 3 4 5 6 7 8 9 100.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of recommended tags
Delicious
Topic ontology with SAAutoTag mechanism
Figure 6.8 Precision for Delicious Datasets
1 2 3 4 5 6 7 8 9 100.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Number of recommended tags
Delicious
Topic ontology with SAAutoTag mechanism
Figure 6.9 Recall for Delicious Datasets
155
1 2 3 4 5 6 7 8 9 100.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5F-Measure for Delicious Datasets
Number of Recommended Tags
AutoTag MechanismTopic Ontology with SA
Figure 6.10 F-Measure for Delicious Datasets
Figure 6.8, 6.9 and 6.10 illustrates the Precision, Recall and F-
Measure for the Bibsonomy datasets respectively. It gradually increases when
more tags of the recommendation is used in Delicious datasets. The proposed
Algorithm achieves better performance than the existing AutoTag mechanism
on the tags whereas it is much difficult to hit the resource specific of the most
popular tags. Though, the proposed approach identifies the semantics of tags
and resources, the approach of discovering semantics varies from AutoTag
mechanism. The detailed dataset holds essential metrics and plots and so it
provides better results.
156
6.6 RESULTS AND DISCUSSION
This research work has calculated the Precision, Recall and F-
Measure values for 10 tags correspondingly. The tag recommendation
approach achieves higher precision than the existing AutoTag
recommendation approach. The recall develops with number of recommended
tags. Proposed topic ontology with spreading activation based tag
recommendation approach is experimentally demonstrated and it will reach
92.35% of the best promising performance when tags are recommended,
which is much higher than the existing approach. Figure 6.11, 6.12 and 6.13
shows the performance comparison of Precision, Recall and F-Measure for
BibSonomy datasets.
Figure 6.11 Comparison of Precision for BibSonomy datasets
157
Figure 6.12 Comparison of Recall for BibSonomy datasets
Figure 6.13 Comparison of F-Measure for BibSonomy datasets
158
Figure 6.14, 6.15 and 6.16 shows the performance comparison of
Precision, Recall and F-Measure for Delicious datasets.
Figure 6.14 Comparison of Precision for Delicious datasets
Figure 6.15 Comparison of Recall for Delicious datasets
159
Figure 6.16 Comparison of F-Measure for Delicious datasets
6.7 CONCLUSION
In this chapter, Experiments demonstrated that tags occurrences are
utilized to present more related tags recommendations to the users.
Experiments in real world datasets are conducted and showed that topic
ontology with spreading activation outperforms the existing AutoTag
mechanism. Conclusion of experiment demonstrated in this research work is:
The development of the topic ontology design in tag recommendation yields a
major advantage. The most popular tags attained rational Precision, Recall
and F-measure on the datasets of Delicious and BibSonomy. Currently, topic
ontology with the spreading activation approach yields a high precision, recall
and F-Measure for both Delicious and BibSonomy.