Transcript
  • Sentiment Analysis in Healthcare A case study using survey responses
  • Outline 1) Sentiment analysis & healthcare 2) Existing tools 3) Conclusions & Recommendations
  • Focus on Healthcare 1) Difficult field biomedical text 2) Potential improvements Relevant Research: NLP procedure: FHF prediction (Roy et. al., 2013) TPA: Who is sick, Google Flu Trends (Maged et. al., 2010) BioTeKS: analyse biomedical text (Mack et. al., 2004)
  • Sentiment Analysis Opinions Thoughts Feelings Used to extract information from raw data
  • Sentiment Analysis Examples Surveys: analyse open-ended questions Business & Governments: assist in the decision-making process & monitor negative communication Consumer feedback: analyse reviews Health: analyse biomedical text
  • Aims & Objectives Can existing Sentiment Analysis tools respond to the needs of any healthcare- related matter? Is it possible to accurate replicate human language using machines?
  • The case study details 8 survey questions (open & close-ended) Analysed 137 responses based on the question: What is your feedback? Commercial tools: Semantria & TheySay Non-commercial tools: Google Predication API & WEKA
  • Survey Overview 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1: navigation Q.2: finding information Q.3: website's appeal Q.6: satisfaction Q.8: recommend website
  • Semantria Collection Analysis Categories Classification Analysis Entity Recognition
  • TheySay Document Sentiment Sentence Sentiment POS Comparison Detection Humour Detection Speculation Analysis Risk Analysis Intent Analysis
  • Commercial Tools Results 39 51 47 Semantria Positive Neutral Negative 45 8 84 TheySay Positive Neutral Negative
  • Introducing a Baseline 0 20 40 60 80 100 1 2 3 4 5 NumberofResponses Score Q.1 Q.2 Q.3 Q.6 Q.8 Neutral Classification Guidelines Equally positive & negative Factual statements Irrelevant statements Class Score Range Positive 1 2.7 Neutral 2.8 4.2 Negative 4.3 - 5
  • Introducing a Baseline Example Polarity Class CG 102 not available Hence: Negative Neutral Classification But Factual Statement Positive or negative? Final label: Neutral Q.1 Q.2 Q.3 Q.6 Q.8 Avg. 3 5 4 5 5 4.4
  • Introducing a Baseline 24 18 95 Manually Classified Responses Positive Neutral Negative
  • Google Prediction API 1) Pre-process the data: punctuation & capital removal, account for negation 2) Separate into training and testing sets 3) Insert pre-labelled data 4) Train model 5) Test model 6) Cross validation: 4-fold 7) Compare with baseline
  • Google Prediction API Results 5 122 10 Classification Results Neutral Negative Positive
  • WEKA 1) Separate into training and testing sets 2) Choose graphical user interface: The Explorer 3) Insert pre-labelled data 4) Pre-process the data: punctuation, capital & stopwords removal and alphabetically tokenize
  • WEKA 5) Consider resampling: whether a balanced dataset is preferred 6) Choose classifier: Nave Bayes 7) Classify using cross validation: 4-fold
  • WEKA Results Resampling: 10% increase in precision 6% increase in accuracy Overall, 82% correctly classified
  • The tools Semantria: range between -2 and 2 TheySay: three percentages for negative, positive & neutral Google Prediction API: three values for negative, positive & neutral WEKA: percentage of correctly classified
  • Evaluation Tool Accuracy Commercial Tools Semantria 51.09% TheySay 68.61% Non-Commercial Tools Google Prediction API 72.25% WEKA 82.35%
  • Evaluation Tool Kappa statistic F-measure Semantria 0.2692 0.550 TheySay 0.3886 0.678 Google Prediction API 0.2199 0.628 WEKA 0.5735 0.809
  • Evaluation
  • Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive PrecisionValue Class Comparison of Precision Semantria TheySay Google API WEKA
  • Evaluation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Negative Neutral Positive RecallValue Class Comparison of Recall Semantria TheySay Google API WEKA
  • Evaluation: Single-sentence responses Tool Accuracy based on correct classification All responses Single- sentence Responses Commercial Tools Semantria 51.09% 53.49% TheySay 68.61% 72.09% Non-Commercial Tools Google Prediction API 72.25% 54% WEKA 82.35% 70%
  • Conclusions Semantria: business use TheySay: prepare for competition & academic research Google Prediction API: classification WEKA: extraction & classification in healthcare
  • Conclusions Commercial tools: easy to use and provide results quickly Non-commercial tools: time-consuming but more reliable
  • Conclusions Is it possible to accurate replicate human language using machines? Approx. 70% accuracy for all tools (except Semantria) WEKA: most powerful tool
  • Conclusions Can existing SA tools respond to the needs of any healthcare-related matter? Commercial tools can not respond Non-commercial can be trained
  • Limitations Only four tools Small dataset Potential errors in manual classification Detailed analysis of single-sentence responses was omitted
  • Recommendations Examine reliability of other commercial tools Investigate other non-commercial tools, especially NLTK and GATE Examine other classifiers (SVM & MaxEnt) Investigate all WEKAs GUI
  • Recommendations Verify labels using more people Label sentence as well as the whole response Negativity associated with long reviews
  • Questions

Recommended