Upload
dheerajbokde
View
219
Download
0
Embed Size (px)
Citation preview
8/10/2019 Classification Final
1/15
S E M I N A R O N
A SURVEY OF CLASSIFICATIONTECHNIQUES ON BIG DATA
8/10/2019 Classification Final
2/15
AgendaIntroductionClassification techniques
Comparative study
Conclusion
References
15-Oct-14
8/10/2019 Classification Final
3/15
IntroductionClassification is the techniques that maps the data into predefined classes and groupsPredict group membership for data instance
Knowledge discoveryFuture plan
Predicts categorical class labels
Classifies data ( constructs a model ) based on the training set and the values ( class la
classifying attribute and uses it in classifying new data
15-Oct-14
8/10/2019 Classification Final
4/15
Introduction (Cont.)Classification
Supervised
Human-guidedclassification
Predictive ordirected
Decision tree
Unsupervised
Calculated bysoftware
Descriptive orundirected
K- means
15-Oct-14
8/10/2019 Classification Final
5/15
Classification Techniques
SupervisedClassification
Techniques
DecisionTree
SupportVector
Machine
NaveBayes
Nearest Neighbor
15-Oct-14
8/10/2019 Classification Final
6/15
Decision TreeIt is a flow chart like a tree structure which classify instances by sorting its attribute values
It generates the rule for the classification of the dataset
Algorithms
Iterative Dichotomer(ID3)
C4.5 Classification RegressionTree (CART)
Measure Entropy information gain Gini diversity indexTop-down procedure Construct Binary DT
Pruning through single pass algorithm Post pruning based on cost
15-Oct-14
8/10/2019 Classification Final
7/15
Bayesian Network It is a graphical model for set of various variable features
Show high accuracy and speed
ProbabilisticLearning Incremental
ProbabilisticPrediction Standards
15-Oct-14
8/10/2019 Classification Final
8/15
Nearest Neighbor The heuristic techniques are used to select the good k
It has some strong consistency results
Instance-based classifiers work by storing training records and using them to predict the clalabel of unseen cases
15-Oct-14
8/10/2019 Classification Final
9/15
Support Vector Machine ( SVM )
It trains classifier to predict the class of the new sample
Key Implementation
Mathematical
programming
KernelFunction
15-Oct-14
8/10/2019 Classification Final
10/15
Support Vector Machine ( SVM )(Cont.)
Algorithms
Linear Non-linear Non separable use
Data is linearly separable Not suitable for C classhypothesis
Noisy data is available
15-Oct-14
8/10/2019 Classification Final
11/15
Comparative StudyTechniques Advantages Disadvantages
Decision tree Simple to understand and interpret Requires little data preparation
Locally-optimal decisions are each node
Do not generalize well from thtraining data
Bayesian Network
Able to handle noisy data Well suited for continuous features
Training time will be large Poor interoperability Require parameters
K- nearestneighbor
Easy to understand Implement classification technique
Computational costs are expen The local data is very sensitiv
require large storage
Support vectormachine
Finds the best classification functionof training data
Prevent over fitting than othermethods
Computationally expensive Require large time and storage Poor interpretability of results
15-Oct-14
8/10/2019 Classification Final
12/15
Cont.
* SVM prediction speed and memory usage are good
** Nave Bayes speed and memory usage are good
*** Nearest neighbor usually has good prediction in low dimensions
Algorithms Predictive
Accuracy
Fitting
Speed
Prediction
Speed
Memory
Usage
Easy to
Interpret
Trees Low Fast Fast Low Yes
SVM High Medium * * *
Nave Bayes Low ** ** ** Yes
Nearest
Neighbor
*** Fast *** Medium High No
15-Oct-14
8/10/2019 Classification Final
13/15
ConclusionHere I have discussed various classification techniques such as Decision tree, Bayesian ne
Nearest neighbor and Support vector machine
Decision tree and SVM have different operational profiles where one is accurate and otheand vice versa
15-Oct-14
8/10/2019 Classification Final
14/15
References1. Seema Sharma, Jitendra Agrawal, Shikha Agarwal, Sanjeev Sharma, Machine Learning Te
Data Mining: A Survey, 979 -1-4799-1597-2/13,2013, IEEE
2. Krisztian Balog, Heri Ramampiaro , Cumulative Citation Recommendation: ClassificationACM 978-1-4503-2034-4/13/07, 2013
3. Mohd Fauzi bin Othman, Thomas Moh Shan Yau, Comparison of Different Classification TUsing WEKA for Breast Cancer,520 -523, Springer-Verlag Berlin Heidelberg 2007
4. Francesco Ricci, lior Rokach, Bracha Shapira , Paul B, Kantor, Recommender System Handb
ISBN 978-0-387-85819-7 Springer Science + Business Media, LLC 2011
15-Oct-14
8/10/2019 Classification Final
15/15
15-Oct-14