Upload
vita
View
29
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Machine Learning in Practice Lecture 9. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Plan for the Day. Announcements Questions? Assignment 4 Quiz Today’s Data Set: Speaker Identification Weka helpful hints - PowerPoint PPT Presentation
Citation preview
Machine Learning in PracticeLecture 9
Carolyn Penstein RoséLanguage Technologies Institute/
Human-Computer Interaction Institute
Plan for the Day Announcements
Questions?Assignment 4Quiz
Today’s Data Set: Speaker Identification Weka helpful hints
Visualizing Errors for Regression ProblemsAlternative forms of cross-validationCreating Train/Test Pairs
Intro to Evaluation
Speaker Identification
Today’s Data Set – Speaker Identification
Preprocessing Speech
Record speech to WAV files.Extract a variety of acoustic and prosodic features.
Predictions: which algorithm will perform better?
What previous data set does this remind you of?
J48 .53 KappaSMO .37 KappaNaïve Bayes .16 Kappa
Notice Ranges and Contingencies
Most Predictive Feature
Least Predictive Feature
What would 1R do?
What would 1R do?
.16 Kappa
Weka Helpful Hints
Evaluating Numeric Prediction: CPU data
Visualizing Classifier Errors for Numeric Prediction
Creating Train/Test Pairs
First click here
Creating Train/Test Pairs
If you pick unsupervised,you’ll get non-stratifiedfolds, otherwise you’llget stratified folds.
Stratified versus Non-Stratified Weka’s standard cross-validation is
stratifiedData is randomized before dividing it into foldsPreserves distribution of class values across
foldsReduces variance in performance
Unstratified cross-validation means there is no randomizationOrder is preservedAdvantage for matching predictions with
instances in Weka
Stratified versus Non-Stratified Leave-one-out cross validation
Train on all but one instance Iterate over all instances
Extreme version of unstratified cross-validation If test set only has one instance, the distribution of
class values cannot be preservedMaximizes amount of data used for training on
each fold
Stratified versus Non-Stratified Leave-one-subpopulation out
If you have several data points from the same subpopulation
Speech data from the same speakerMay have data from same subpopulation in
train and testover-estimates overlap between train and test
When is this not a problem?You can manually make sure that won’t
happenYou have to do that by hand
Creating Train/Test Pairs
If you pick unsupervised,you’ll get non-stratifiedfolds, otherwise you’llget stratified folds.
Creating Train/Test Pairs
Now click here
Creating Train/Test Pairs
Creating Train/Test Pairs
You’re going torun this filter 20times altogether.twice for every fold.
Creating Train/Test Pairs
True for Train, falsefor Test
Creating Train/Test Pairs
If you’re doingStratified, make sureyou have to classattribute selectedhere.
Creating Train/Test Pairs
1. Click Apply
Creating Train/Test Pairs
2. Save the file
Creating Train/Test Pairs
3. Undo before youcreate the next file
Doing Manual Train/Test* First load thetraining data on thePreprocess tab
Doing Manual Train/Test
* Now select SuppliedTest Set as the Test Option
Doing Manual Train/Test
Then Click Set
Doing Manual Train/Test* Next Load the Testset
Doing Manual Train/Test
* Then you’re allset, so click on Start
Evaluation Methodology
Intro to Chapter 5 Many techniques illustrated in Chapter 5
(ROC curves, recall-precision curves) don’t show up in applied papersThey are useful for showing trade-offs between
properties of different algorithmsYou see them in theoretical machine learning
papers
Intro to Chapter 5 Still important to understand what they
represent The thinking behind the techniques will show
up in your papersYou need to know what your numbers do and
don’t demonstrateThey give you a unified framework for thinking
about machine learning techniquesThere is no cookie cutter for a good evaluation
Confidence Intervals Mainly important if there is some question about
whether your data set is big enough You average your performance over 10 folds, but
how certain can you be that the number you got is correct?
We saw before that performance varies from fold to fold
0 10 20 30 40( )
Confidence Intervals We know that the distribution of categories found
in the training set and in the testing set affects the performance
Performance on two different sets will not be the same
Confidence intervals allow us to say that the probability of the real performance value being within a certain range from the observed value is 90%
0 10 20 30 40( )
Confidence Intervals Confidence limits come from the normal
distribution Computed in terms of number of standard
deviations from the mean If the data is normally distributed, there is a
15% chance of the real value being more than 1 standard deviation above the mean
What is a significance test? How likely is it that the difference you see
occurred by chance? How could the difference occur by
chance?
0 10 20 30 40( ( ) )
If the mean of one distribution is within theconfidence interval of another, the difference you observe could be by chance.
If you want p<.05, you need the 90% confidence intervals. Find the correspondingZ scores from a standard normal distribution table.
Computing Confidence Intervals 90% confidence interval corresponds to
z=1.655% chance that a data point will occur to the
right of the rightmost edge of the interval f = percentage of successes N = number of trials p = (f + z2/2N +or- z(squrt(f/N – f2/N + z2/4N2)))/(1
+ z2/N) f=75%, N=1000, c=90% -> [0.727,0.773]
Significance Tests If you want to know whether the difference
in performance between Approach A and Approach B is significantGet performance numbers for A and B on each
fold of a 10-fold cross validationYou can use the Experimenter or you can do
the computation in Excel or Minitab If you use exactly the same “folds” across
approaches you can use a paired t-test rather than an unpaired t-test
Significance TestsDon’t forget that you can get a significant result
by chance! The Experimenter corrects for multiple comparisons
Significance tests are less important if you have a large amount of data and the difference in performance between approaches is large
Using the Experimenter* First click New
Using the Experimenter
Make sureSimple is selected
Using the Experimenter
Select .csvas the outputfile formatand click on Browse
Enter file name
Click on Add New
Using the Experimenter
Load data set
Using the Experimenter
10 repetitions isbetter than 1, but1 is faster.
Using the Experimenter
Click on Add Newto add algorithms
Using the Experimenter
Click Choose toselect algorithm
Using the Experimenter
You should addNaïve Bayes, SMO, and J48
Using the Experimenter
Then click onthe Run tab
Using the Experimenter
Click on Start
Using the Experimenter
When it’s done,Click on Analyze
Using the Experimenter
Click File to loadthe results file yousaved
Using the Experimenter
Do Analysis
* Explicitly selectdefault settings here
* Then select Kappa Here
* Then select PerformTest
Do Analysis* Base case is what you are comparing with
CSV Output
Analyze with Minitab
More Complex Statistical Analyses
I put a Minitab manual in the Readings folder on Blackboard.
Take Home Message We focused on practical, methodological
aspects of the topic of Evaluation We talked about the concept of a
confidence interval and significance tests We learned how to create Train/Test pairs
for manual cross-validation, which is useful for preparing for an error analysis
We also learned how to use the Experimenter to do experiments and run significance tests