Machine Learning in 25 minutes or
lessAnd why the HotOS folks should care...
Terran LaneDept. of Computer ScienceUniversity of New Mexico
20
15
Machine learning is the study of algorithms or systems that improve their performance in response to experience.
Machine learning is the study of algorithms or systems that improve their performance in response to experience.
Machine learning is the study of algorithms or systems that improve their performance in response to experience.
Machine learning is the study of algorithms or systems that improve their performance in response to experience.
The core ML problem
The W
orl d
The core ML problemThe W
orl d
- Network- CPU- Program memory footprint- User activity- Multi-process performance
The core ML problem
The W
orl d
Senso
rs
The core ML problem
The W
orl d
Senso
rs
- Latency; bandwidth- Branches taken; cache misses- Memory allocs; object age- Keystroke rates; recent commands- Process throughput; cache activity; synch delays
The core ML problem
The W
orl d
Senso
rs
X
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
prediction
The core ML problemThe W
orl d
Senso
rsModel
f(X)
X
- Compression/redundancy rates- Branch prediction- Object lifetime- Legitimate/hostile- Normal/abnormal
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
ŷ
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
L(ŷ)
assessment
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
L(ŷ,y)
assessment
y
The core ML problemThe W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
L(ŷ,y)
assessment
y
- accuracy (0/1 loss)- squared error- time-to-response
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
assessment
control
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
assessment
response
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
assessment
L(ŷ,X’)
The core ML problemThe W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
assessment
L(ŷ,X’)
- Correctness- Stability- Robustness- Total system performance (throughput, latency, etc.)
The core ML problem
The W
orl d
Senso
rsModel
f(X)
X
Performancemeasure
assessment
The core ML problemThe W
orl d
Senso
rsModel
f(X)
X
Performancemeasure
assessment
- ???- Do you like the model?- Does it make sense?- Does it make you feel warm and fuzzy?
The core ML problemThe W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
assessment
The ML job:find this...
The core ML problemThe W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
assessment
The ML job:find this...
... so thatthis is as good
as possible.
Types of learning•Supervised
•Reinforcement learning
•Unsupervised
•Special cases:
•Semi-supervised
•Anomaly detection
•Behavioral cloning
•etc...
Supervised Learning•Characteristics:
•Measure features/sensor values ⇒ X
•Want to predict system “output”, y
•Have some source of example (X,y) pairs
•System, human-labeling, etc.
•Have a well-defined performance criterion
Example sup. learners•Discriminative: only produces classifier
•Decision tree: fast; comprehensible models
•Support vector machine: high dim data; accurate
•Nearest-neighbor / k-nn: low-dim data; slow
•Neural net: special case of SVM
•Generative: produces complete probability model
•Naive Bayes: very simple; surprisingly accurate
•Bayesian network: powerful; descriptive; accurate
•Markov random field: closely related to BNs
•Meta-learners/ensemble methods: sets of models
•Boosting
•Bagging
•Winnow
Key assumption #1
The train/test data reflect the same data
distribution that will be experienced when the
learned model is embedded in
performance system.•System not changing over time
•Model doesn’t affect behavior of system
Key assumption #2
All data points are statistically
independent.
•No linkage between “adjacent”/“successive” points
•No other process that is affecting data generation
Reinforcement learning•Characteristics:
•Measure features of system ⇒ X
•Want to control sys. -- model outputs are “knobs”
•Can interact with system/simulation
•Have performance measure that recognizes “good” system behavior
•Don’t need to know “correct” control actions
Key criterion•Are the sensor readings enough to completely
characterize state of the system?
•Knowing X tells you everything relevant
•Yes:
•“Fully observable”
•Learning optimal performance fairly tractable (*)
•No (multiple system states produce same X):
•“Partially observable”
•Learning barely satisfactory performance incredibly difficult (PSPACE-complete. Or worse.)
RL: The good news•It does everything that traditional control
doesn’t!
•Stochasticity ok
•Don’t need a model
•Don’t need linearity
•Discrete time ok
•No messy ODEs or z transforms!
•Delay ok
RL: The bad news•Low dimensions
•Discrete variables/features
•Need to know state space
•Convergence can be slow
•Glacial
•Optimal control can be intractable
Example RL•Fully observable systems
•Q-learning
•SARSA
•Dyna
•E3
•Partially observable
•Reinforce
•Utile distinction memories
•Policy gradient methods
Key difference #1Unlike supervised learning...
Distinct data points can be temporally
correlated.•Key parameter: how much history is
necessary to characterize the system?
•Markov order
•1 time unit? 2? All of them?
Key difference #2Unlike supervised learning...
Model is expected to influence behavior of
system•It’s a good thing...
References (partial)•General:
•Mitchell, Machine Learning, McGraw-Hill, 1997.
•Duda, Hart, & Stork, Pattern Classification, Wiley, 2001.
•Hastie, Tibshirani, & Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001.
•Software (general; mostly supervised):
•Weka: Data Mining Software in Java.http://www.cs.waikato.ac.nz/ml/weka/
References (partial)•Decision trees:
•Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, 1993.
•Brieman, Classification & Regression Trees (CART), Wadsworth, 1983.
•Support vector machines:
•Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, 2(2), 1998.
•Software: SVMlighthttp://svmlight.joachims.org/
References (partial)•Reinforcement learning
•Sutton & Barto, Reinforcment Learning: An Introduction, MIT Press, 1998.
•Kaelbling, Littman, & Moore, “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research, 4, 1996.
•Kaelbling, Littman, & Cassandra, “Planning and Acting in Partially Observable Stochastic Domains”, Artificial Intelligence, 101,1998.
Thank you!
Questions?
ML keywords•Learning
•Adaptive
•Self-tuning
•State estimation
•Parameter estimation
•Data mining
•Computational statistics
•Predictive modeling
•Pattern recognition
•etc...
The Learning LoopThe W
orl d
Senso
rsModel
f(X)
X
ŷ
Performancemeasure
L(ŷ,y)
assessment
y
Generate“training”
data
Learningmodule
f(X)
Performancemeasure
The training process•Gather large set of “training data”
•Dtrain
=[ (X1,y
1), (X
2,y
2), ... , (X
n,y
n) ]
•Also large set of “testing” (eval; holdout) data
•Deval
=[ (X1,y
1), ... , (X
m,y
m) ]
•Apply learner to train to get model
•f() = learn(Dtrain
,L)
•Evaluate results on test set
•[ ŷtest
] = f(Xtest
)
•assessment = L(ŷtest
,ytest
)