Upload
aubrie-merritt
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
CSc288 Term Project Data mining on predict Voice-over-IP Phones market
----- Huaqin Xu
Agenda
Abstract Introduction Methodology Result Conclusion Learning Experience References
Abstract
This project based on the VoIP survey data sets. Weka explorer’s classifiers are chosen as data mining tool to build models to predict potential customers of VoIP phone and the most important features and services of two VoIP models.
Introduction
BackgroundVoIP phone has a potential opportunity with
the wide use of internet service.Two VoIP phone models: Basic & Deluxe
Data mining ScopeCustomerProduct features and services
Methodology
Data Mining Tools C4.5/C5.0, Cubist Weka Microsoft SQL Server SPSS
Chose: Weka Explorer
Why? Free, Easy, Good Interface, More choices……
Methodology
Explorer Vs KnowledgeFlow
Methodology
Datasets: Totally: 94 instances
Methodology Preprocessing
Split table Customer: 17 attributes Basic-model: 14 attributes Deluxe-model: 10 attributes
Processing Missing data Delete Replaced by “?”
Transfer data typeSPSS Excel Weka
Methodology
Algorithm selectionClassification ClusteringAssociation
Chose: NNgeWhy?
High accuracy rate Simple, clear Rules
Algorithms Correct Instances (%)
Naivebayes 63.82
DecisionStump 65.95
Id3 84.04
J48 75.53
NBTree 79.78
ConjunctiveRule 69.14
DecisionTable 80.85
NNge 87.23
OneR 71.27
PART 72.34
Prism 88.29
Ridor 71.27
JRip 74.46
ZeroR 63.83
AdaBoostM1 65.95
BayesNet 60.63
NNge classifier Nearest-neighbor like algorithm using
non-nested generalized exemplars. a rule based classifier builds a sort of “hypergeometric” model. shows promise as an ML method that
performs well on a wide range of datasets
Methodology
Result
Result
Result Rules:
One of customer rules :class Would_Buy IF :
cost in {10-20} ^ phone in {yes} ^ email in {yes} ^ fax in {no} ^ chat in {yes,no} ^ other in {no} ^ service type in {Phone_cards_only} ^ price in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ voice_quality in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ service in {Somewhat_Dissatisfied} ^ convenience in {Somewhat_Satisfied} ^ promotion in {Somewhat_Dissatisfied} ^ Know VoIP in {yes,no} ^ marital status in {Single} ^ gender in {Male} (11)
Result Stat:
Classes allocation Feature weights
Result Basic-model & Deluxe-model
Schema: meta.AttributeSelectedClassifier
Subschema: rules.NNge
Selected attributes: 3,6,8,10,11,12 : 6
Why?avoid overfitting
Result
Evaluation
Ten-fold cross-validation Summary
Correctly classified instances > 85% Detailed Accuracy By Class
TP, FP, Precision, Recall, F measure Confusion Matrix
Misclassified instances:12 instances/94 instances
Result
Conclusion
LimitationSmall Datasets Incomplete Data source
ModelsHigh accuracy rateHelp further Market AnalysisHelp product design
Learning Experience
Process a real data mining problem Know Classification algorithms better
Numeric, Nominal Missing data Overfitting
Know Evaluation methods better How to compare algorithms Evaluation factors
Learning Experience
Learn how to use WekaFuture work: learn how to modify source to
perform better data mining Learn from classmates
References
”Data Mining - Concepts and Techniques" by Jiawei Han and Micheline Kamber, Morgan Kaufmann 2001.
“Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” by Ian H. Witten and Eibe Frank, Morgan Kaufmann 2000.
http://www.cs.waikato.ac.nz/~ml/index.html. Machine Learning---Weka Home Page
Marketing Research by David A. Aaker, V. Kumer and George S. Day, eighth edition, Willey 2004.
Thank you