Classification Final

Embed Size (px)

Citation preview

  • 8/10/2019 Classification Final

    1/15

    S E M I N A R O N

    A SURVEY OF CLASSIFICATIONTECHNIQUES ON BIG DATA

  • 8/10/2019 Classification Final

    2/15

    AgendaIntroductionClassification techniques

    Comparative study

    Conclusion

    References

    15-Oct-14

  • 8/10/2019 Classification Final

    3/15

    IntroductionClassification is the techniques that maps the data into predefined classes and groupsPredict group membership for data instance

    Knowledge discoveryFuture plan

    Predicts categorical class labels

    Classifies data ( constructs a model ) based on the training set and the values ( class la

    classifying attribute and uses it in classifying new data

    15-Oct-14

  • 8/10/2019 Classification Final

    4/15

    Introduction (Cont.)Classification

    Supervised

    Human-guidedclassification

    Predictive ordirected

    Decision tree

    Unsupervised

    Calculated bysoftware

    Descriptive orundirected

    K- means

    15-Oct-14

  • 8/10/2019 Classification Final

    5/15

    Classification Techniques

    SupervisedClassification

    Techniques

    DecisionTree

    SupportVector

    Machine

    NaveBayes

    Nearest Neighbor

    15-Oct-14

  • 8/10/2019 Classification Final

    6/15

    Decision TreeIt is a flow chart like a tree structure which classify instances by sorting its attribute values

    It generates the rule for the classification of the dataset

    Algorithms

    Iterative Dichotomer(ID3)

    C4.5 Classification RegressionTree (CART)

    Measure Entropy information gain Gini diversity indexTop-down procedure Construct Binary DT

    Pruning through single pass algorithm Post pruning based on cost

    15-Oct-14

  • 8/10/2019 Classification Final

    7/15

    Bayesian Network It is a graphical model for set of various variable features

    Show high accuracy and speed

    ProbabilisticLearning Incremental

    ProbabilisticPrediction Standards

    15-Oct-14

  • 8/10/2019 Classification Final

    8/15

    Nearest Neighbor The heuristic techniques are used to select the good k

    It has some strong consistency results

    Instance-based classifiers work by storing training records and using them to predict the clalabel of unseen cases

    15-Oct-14

  • 8/10/2019 Classification Final

    9/15

    Support Vector Machine ( SVM )

    It trains classifier to predict the class of the new sample

    Key Implementation

    Mathematical

    programming

    KernelFunction

    15-Oct-14

  • 8/10/2019 Classification Final

    10/15

    Support Vector Machine ( SVM )(Cont.)

    Algorithms

    Linear Non-linear Non separable use

    Data is linearly separable Not suitable for C classhypothesis

    Noisy data is available

    15-Oct-14

  • 8/10/2019 Classification Final

    11/15

    Comparative StudyTechniques Advantages Disadvantages

    Decision tree Simple to understand and interpret Requires little data preparation

    Locally-optimal decisions are each node

    Do not generalize well from thtraining data

    Bayesian Network

    Able to handle noisy data Well suited for continuous features

    Training time will be large Poor interoperability Require parameters

    K- nearestneighbor

    Easy to understand Implement classification technique

    Computational costs are expen The local data is very sensitiv

    require large storage

    Support vectormachine

    Finds the best classification functionof training data

    Prevent over fitting than othermethods

    Computationally expensive Require large time and storage Poor interpretability of results

    15-Oct-14

  • 8/10/2019 Classification Final

    12/15

    Cont.

    * SVM prediction speed and memory usage are good

    ** Nave Bayes speed and memory usage are good

    *** Nearest neighbor usually has good prediction in low dimensions

    Algorithms Predictive

    Accuracy

    Fitting

    Speed

    Prediction

    Speed

    Memory

    Usage

    Easy to

    Interpret

    Trees Low Fast Fast Low Yes

    SVM High Medium * * *

    Nave Bayes Low ** ** ** Yes

    Nearest

    Neighbor

    *** Fast *** Medium High No

    15-Oct-14

  • 8/10/2019 Classification Final

    13/15

    ConclusionHere I have discussed various classification techniques such as Decision tree, Bayesian ne

    Nearest neighbor and Support vector machine

    Decision tree and SVM have different operational profiles where one is accurate and otheand vice versa

    15-Oct-14

  • 8/10/2019 Classification Final

    14/15

    References1. Seema Sharma, Jitendra Agrawal, Shikha Agarwal, Sanjeev Sharma, Machine Learning Te

    Data Mining: A Survey, 979 -1-4799-1597-2/13,2013, IEEE

    2. Krisztian Balog, Heri Ramampiaro , Cumulative Citation Recommendation: ClassificationACM 978-1-4503-2034-4/13/07, 2013

    3. Mohd Fauzi bin Othman, Thomas Moh Shan Yau, Comparison of Different Classification TUsing WEKA for Breast Cancer,520 -523, Springer-Verlag Berlin Heidelberg 2007

    4. Francesco Ricci, lior Rokach, Bracha Shapira , Paul B, Kantor, Recommender System Handb

    ISBN 978-0-387-85819-7 Springer Science + Business Media, LLC 2011

    15-Oct-14

  • 8/10/2019 Classification Final

    15/15

    15-Oct-14