Upload
boshra-albayaty
View
137
Download
0
Embed Size (px)
Citation preview
WELCOME PhD Journey In India
By : Boshra F. Zopon Al_Bayaty
Prof . Dr. Shashank. D. Joshi
(Guide)
Knowledge Discovery from
Web Search
OUTLINE
PhD Course Work
Knowledge Discovery from Web Search
National and International Conferences
The Research Contribution
Conclusion and Suggestion for Future Work
Knowledge Discovery From Web Search, PhD Journey
PhD Course Work • The Students Play an important part in College development
Knowledge Discovery From Web Search, PhD Journey
PhD Course Work • The Students Play an important part in College development
INTRODUCTION
Knowledge discovery is a process to extract useful information from the source of information or data by using a
combination of machine learning, statistical analysis, search engine, modeling techniques and natural language processing.
Knowledge discovery is an extension of information retrieval. Information retrieval is extension of data mining. Therefore,
the process of IR data miming will support knowledge discovery directly or indirectly.
Because of the popularity of computers and networks, Internet has become the most important information source.
Traditionally, people use some keywords and simple Boolean algebra to search the related articles.
The best example of knowledge discovery is a tool like search engine which helps to extract information. Evaluation of any
web search engine is the key to ensure the effectiveness, efficiency, Scalability, and usability of these browsing methods.
Because of the imprecise results of keyword search in the Internet, all the studies of web mining method are trying to improve
the accuracy or value of the information gotten from the web pages.
Although search by keywords is the most efficient and popular method to find related information from the Internet, it exists
two problems by using this method.
1. The first is that some search results don’t match with the user’s requirement.
2. There are too many similar articles in the search results.
Because of the two problems, users spend a lot of time organizing the search results and finding what they really want.
The knowledge discovery of sense with the help of context can be done by Word Sense Disambiguation which is open problem in Natural Language Processing.
Word Sense Disambiguation is the ability to computationally determine which sense of a word has being used.
The main WSD methods are : Stacking and Voting, voting can be weighted and non-weighted
6
Problem Definition
Fig. 2. The Screenshot from WordNet Shows the Multiple meaning of Straight Word
Knowledge Discovery From Web Search, PhD Journey
Goals and objective set for research work are as follows:
1. To analyze the influence of context on determining the sense of given word with the help of a technique by creating separate context for every sense of every word.
2. To study different type of techniques used for knowledge discovery, apply them for the process of disambiguation, and improve the accuracy.
3. To design and implement new model called “Master- Slave” model.
4. To evaluate the performance of proposed model with the help of different parameters like precision, recall, F-measure.
7
Goals and Objective
Knowledge Discovery From Web Search, PhD Journey
8
Supervised Algorithms Suggested
in Research work
Naive Bayes (NB)
Decision Tree (DT)
Decision list (DL)
AdaBoost (AB)
Support Vector
Machine(SVM)
System Requirements and Analysis
Fig.5. Five Supervised Selected
Knowledge Discovery From Web Search, PhD Journey
MASTER – SLAVE MODEL
Slave Classifiers
Cn
Master
Classifier
O/P
O/P
Input Data Set
Output
C1
The Reputation
Knowledge Discovery From Web Search, PhD Journey
THE REFERENCE OF THE CONTEXT
10
http://www. e-quran.com/language
Fig.9. The resource of data set
Knowledge Discovery From Web Search, PhD Journey
The Source of Context: In order to provide input of words, the process of word sense disambiguation is executed for that word. These words are selected from one paragraph in a holy book “Al_Quran” [E-QURAN.COM] as shown in fig. 8, to perform word sense disambiguation.
11
System Requirements and Analysis
Fig.8. The resource of data set
Knowledge Discovery From Web Search, PhD Journey
12
•At this Stage Accuracy related with every algorithm still not up to mark.
• Decision List selected as Master approach for two reasons:
1. Got high Accuracy
2. It’s reputation: Decision list is one of the robust approaches in word sense disambiguation field to address sense
disambiguation. It has long history background e.g. - Kelly and stone, 1975, Block, 1988. Decision list is one of the reputed
algorithms with considerable historic background. History performance is a very important parameter that plays vital role
in deciding algorithm as Master or Slave in our suggested model. Decision list has a good reputation in WSD field, from the
results previous work is reported.
No. Approach Accuracy (%)
1. Decision List 69.12
2. Adaboost 65.27
3. Naïve Bayes 62.86
4. SVM 56.11
5. Decision Tree 45.14
TABLE 3
The final results of five supervised approaches
System Design: Select Master approach (The First Part of System)
0
50
100
Ac
cu
ra
cy
%
Decision List
Adaboost Naïve Bayes
SVM Decision
Tree
Accuracy (%) 69.12 65.27 62.86 56.11 45.14
Accuracy (%)
Fig 22: Final accuracy Algorithms graph
Knowledge Discovery From Web Search, PhD Journey
13
System Development and Implementation Algorithm
Input: Data Set, Context, Choice of algorithm
Output: Correct sense according to context.
Process: Word Sense Disambiguation.
For Loop
For Loop
Step1 Select data set, Data source, context and
the algorithm.
Step2 For all words in data set (W), For all
sense (S)
Step 3 (features) find POS from data source (d)
Step 4 Use Master-Slave algorithms.
Step 5 Calculate sense wise P,R and F.
Step6 Select sense with highest value
Step7 Sum all accuracies to calculate overall
accuracy
Step8 boosting factor addition
Step9 Display sense accuracy
End Loop
End Loop
Step1. Accuracy of Master X % is collected.
Step2. Accuracy of Slave y %
Step3. Collect voting to improve X by using factor F= (X - f)/100.
Step4. Accuracy of Word=old Accuracy + F
Step5. Apply this factor for all words, X1, X2, X3…, and X15.
Step6. Calculate precision, Recall, and f-measure.
System Design: The Second Part of System
Knowledge Discovery From Web Search, PhD Journey
14
No. Approach Before Combination
Recall Precision F- measure
1 N.Bayes 30.573 62.86 188.58
2 D. List 44.033 69.126 207.38
3 Adaboost 45.92 65.273 195.82
Discussion on Results (Before Combination)
0
500
1000
Pra
ise
Na
me
Wo
rsh
ip
Wo
rld
s
Lo
rd
Ow
ner
Rec
om
pe-
nse
Tru
st
Gu
ide
Str
aig
ht
Pa
th
an
ger
Da
y
Fa
vo
red
Hel
p
COMPARATIVE ANALYSIS OF PRECISION
1st Experiment Precision
2nd Experiment Precision
The Master–Slave model deals with three experiments. In the first experiment, Decision list acts
a Master and Naïve Bayes act as Slave. Individually each algorithm gives good values of precision
and f-measure.
Fig 27: Comparative analysis Graph
Knowledge Discovery From Web Search, PhD Journey
15
Approach After Combination
Recall Precision F-
measure
1st Experiment (N.Bayes +
D.L) 68.46667 51.06 1531.8
2nd Experiment (D.L+ Ada)
52.61333 69.23333 2077
3rd Experiment (N.Bayes +
Ada +D.L)
47.37333 70.14667 2104.4
0
500
1000
Pra
ise
Na
me
Wo
rsh
ip
Wo
rld
s
Lo
rd
Ow
ner
Rec
om
pe-
nse
Tru
st
Gu
ide
Str
aig
ht
Pa
th
an
ger
Da
y
Fa
vo
red
Hel
p
COMPARATIVE ANALYSIS OF RECALL
1st Experiment Recall
2nd Experiment Recall
Second combination: used for experiment, in the combination Decision list acts as Master and
Adaboost acts as a Slave. The details of accuracies are mentioned below:
Overall precision 69.23% and recall is 52.61%, so the results of the experiment are satisfactory and
the overall rise in terms of recall and precision is 85.80 and 1.0733 respectively.
Third experiment: the details of accuracy are mentioned below:
Overall precision is 70.14%, recall is 47.37%, which gives rise of 48.73 and 14.53 respectively.
First experiment: The details of accuracy are mentioned below:
Overall precision is 51.06%, recall is 68.46%, which gives rise in Recall more than Precision
Fig 28: Comparative analysis Graph
Discussion on Results (After Combination)
Knowledge Discovery From Web Search, PhD Journey
16
Approach Enhancement
Recall Precisio
n
F- measure
1st Experiment (N.Bayes +
D.L)
378.9367 -118 -354
2nd Experiment (D.L+ Ada) 85.8033 1.0733 3.2
3rd Experiment (N.Bayes +
Ada +D.L)
14.5333 48.7367 146.2
0
5000
Pra
ise
Na
me
Wo
rsh
ip
Wo
rld
s
Lo
rd
Ow
ner
Rec
om
pe…
Tru
st
Gu
ide
Str
aig
ht
Pa
th
an
ger
Da
y
Fa
vo
red
Hel
p
COMPARATIVE ANALYSIS OF F-MEASURE 1st Experiment F-Measure
2nd Experiment F-Measure
Third experiment: It is observed that there in increase in precision and f-measure by 48.7367 and
146.2 respectively; this combination gives all round performance for precision.
Second experiment: There is increase in precision by 1.0733 and f-measure 3.2, unlike to the first
experiment recall is decreased. This is enhancement in precision to resolve word sense
disambiguation problem.
First experiment: When they are combined together its recall is enhanced which might be useful
application like search engine which requires more coverage of sample space, but word sense
disambiguation it is less useful.
Fig 29: Comparative analysis Graph
Discussion on Results (Enhancement)
Knowledge Discovery From Web Search, PhD Journey
Empower WSD with social N/W.
There are number of applications where Master-Slave modeling is needed, that is when user enters a query that query could be
refined with the help of the information or tags received from the social networking site from profile of that individual or the thing
which should or liked by the individual. This process will not only ensure correct sense of a word but it will also increase the
accuracy of a given results displayed.
Empower Translation online
Web-browser to run on online for WSD and provides online interface between user and system to support some application like
Google or Bing translations and this enable the user to easily comprehend the out put.
M-S model for other languages
Would like Master- Slave to support more and more languages like Arabic, Hindi, Germany and so on.
17
Conclusion and Suggestion for future Work
Knowledge Discovery From Web Search, PhD Journey
The advantages of this work are to improve the accuracy, disambiguate word, and analyze the relationship among
data set, algorithm and context.
Our proposed solution to this problem provides good level of accuracy. Result of the experiments in this research;
are as per the anticipation, delivering accuracy more than ( 70.14%).
WSD is still one of the central challenges in NLP and all researchers try to meet it.
18
The Research Contribution
• Model
Proposed Model to supervised Algorithms with Master- Slave Combination
• Algorithm
The experiment performed use novel algorithm which is Master- Slave algorithm
using boosting factor. This Master- Slave algorithm (Unique Algorithm) is formed by
selecting best set of algorithms to improve the accuracy of disambiguation.
• Design
The Master-Slave algorithm performance is efficiently with the help of boosting
factor, this boosting factor depend upon the error rate and varies accuracy.
• Performance Optimization
Results of experiments presented with the help of graph proves that selected
algorithm and design work to improvise the accuracy equal to 70.14% this helps to
disambiguate sense efficiently.
•Comparison of novel approach has been made to prove the excellence of it with
respect all other approach.
Knowledge Discovery From Web Search, PhD Journey
National conference
Attended and published paper, National in Computer Science and Information Technology organized by Y
M College, Pune held on 27-28 Sept. 2013.
Attended and published paper, National Conference on, Modeling, Optimization and Control, NCMOC 4th
To 6th March, 2015.
Attended National Conference on Advance Technologies for Secured Communication Using 4G & LTE
(ATSC-2014), B. V. U, College of Engineering, Pune. 5-6 February, 2014.
Attended National Conference, On FOSSsumMIT’14, In association with Pune Linux Group, Department of
Computer Engineering, MITCOE, Pune, 1st to 2nd August 2014.
International Conferences
International conference IEEE Canada, IHTC, Ottawa, http://www.ihtc2015. ieee.ca/, 31 May- 4th June, 2015.
International Conference on Knowledge and Software Engineering, December 6-7 2014, Paris, France.
International Conference on Emerging Trends in Science and Cutting Edge Technology (ICETSCET),
YMCA, New Delhi, 28 September, 2014. www.icetscet.com.
International Conference on current advances in Engineering and Technology (ICET-14), Knowledge and
Software Engineering, Trivandurm, Kerala, IFERP Connecting engineers..Developing research (Unit of
VVERT), 14th December, 2014. www.icet.com.
National and International Conferences
Knowledge Discovery From Web Search, PhD Journey
•International Conferences
Canada – Ottawa , Parise- France
Knowledge Discovery From Web Search, PhD Journey
•International Conferences
Trivandurm and New Delhi
Knowledge Discovery From Web Search, PhD Journey
SOME SUGGESTIONS
Advantages of Workshops.
The progress reports and Scientific research .
The Main three Stages For PhD degree.
Very Positive Result.
Knowledge Discovery From Web Search, PhD Journey
•ADVANTAGES OF WORKSHOPS
Knowledge Discovery From Web Search, PhD Journey
SIX MONTHLY PROGRESS REPORTS
Knowledge Discovery From Web Search, PhD Journey
REVIEW AND COMMENTS FROM FIRST PRESENTATION
Introduction
Literature Review
Problem Definition (Word Sense
Disambiguation)
Objective of Study
Methodology
Research plan
Select Research Approaches (Five Supervised
Approaches)
System Modeling (Master – Slave
Techniques)
System Requirements
Publication (2 papers)
Conclusion
Source of Bibliography
References
25
Sr.
No.
Comment Status
1. Data Normalizing is required Done
2. Refer more papers based on Supervised
neural network
Done
Table. 1 The status of first presentation comments
Knowledge Discovery From Web Search, PhD Journey
The Three Stages For PhD degree
Review for Second Presentation
Introduction
Literature Review (Revised)
Problem Definition
Objective of Study
Motivation
Methodology
The Work Done So Far
Jump to Master – Slave Technique
The Reference of Context and Data Set selected
(Sys. Requirements and Data Normalization)
Modeling – designing- Compilation
Supervised Approaches under Study Implemented
The Comparative Analysis of the Results
The Limitation and Suggestion for future work
Conclusion
System Development Life – Cycle Phases (SDLC)
The Research Contribution in Knowledge and Scientific Research.
Bibliography
Activities and Publications
REVIEW AND COMMENTS FROM SECOND PRESENTATION
26
Sr. No. Comment Status
1. The candidate presented the program of
work which was in with the approved
objectives. It is suggested use of decision
tree and supervised learning.
Done by clarification on decision tree by using example related implementation.
2. Thesis hypothesis could be revisited. The hypothesis or the assumptions made are mentioned below:
1. To perform the combination, the algorithm selected should be based on the individual
performance and reputation.
2. To disambiguate the sense the context has to select.
3. To know POS and senses there must be trust is on the word source referred.
4. Improvement in accuracy of the disambiguation.
5. Increase the performance of algorithm using Master- Slave system.
6. Improvement in the word sense disambiguation irrespective of amount of data set,
data source, context.
7. To improved the algorithm with all combinations.
Table. 2 The status of Second presentation comments
The Three Stages For PhD degree
Knowledge Discovery From Web Search, PhD Journey
VERY POSITIVE RESULT.
Knowledge Discovery From Web Search, PhD Journey
VERY POSITIVE RESULT
Knowledge Discovery From Web Search, PhD Journey
29
Google Scholar search
Knowledge Discovery From Web Search, PhD Journey