- 1. Machine learningOverview PD. Dr. Gabriella Kkai
[email_address] Friedrich-Alexander-Universitt Lehrstuhl fr
Informatik 2 Raum 04.131 Tel: 8528996
2. Machine Learning: Content
- How can a learning problem be defined
- Designing a learning system: learning to play checker
- Perspectives and questions in ML
3. Why Machine Learning? (1/10)
- Webster 's definition of 'learn'
-
- 'To gain knowledge, or understanding of, or skill in by study
instruction or experience
- Simons' definition (Machine Learning I, 1993, Chapter 2.)
-
- 'Learning denotes changes in the system that are adaptive in
the sense that they enable the system to do the same task or
tasksdrawn from the same population more effectively the next
time
- Donald Michie's Definition (Computer Journal 1991)
-
- 'A learning system uses sample data to generate an update basis
for improved (performance) on subsequent data from the same source
and express the new basis in intelligible symbolic form'
4. Why Machine Learning? (2/10)
- Machine learning istypically thought of as a sup-topic of
artificial intelligence.
- It is inspired by several disciplines
Machine Learning Cognitive Science Statistic PatternRecognition
Computer Science 5. Why Machine Learning? (3/10)
-
- ArtificialIntelligence:Learning: Learning symbolic
representation of concepts, ML as search problem , Prior knowledge
+ training examples guide the learning-process
-
- BayesianMethods:Calculatingprobabilitiesof the hypotheses,
Bayesian-classifier
-
- Theory ofthe computational complexity:Theoretical boundsof the
complexity for different learning task measured in the terms ofthe
computational effort,number of differenttraining examples, the
number ofmistakesrequired in order to learn
-
- Informationtheory:Measurement of the entropy, minimal
description length, optimal codes and their relationship to optimal
training sequences for encoding a hypothesis
-
- Philosophy:Occam's razor suggesting the simpliest hypothesis is
the best
-
- Psychology and Neurobiology:Motivation of NN the power law of
the practice
-
- Statistics : Characterisation of the errors (e.g.
bias,variance),thatoccur when estimating the accuracy of hypothesis
based, confidence interval, statistical tests
- Goal: Description of the different learning paradigms, the
algorithms, the theoretical results and applications
6. Why Machine Learning?(4/10)
-
- Availability of the background knowledge
-
- Characteristics of the data
-
-
- Propositional or first-order
7. Why Machine Learning?(5/10)
-
-
- Induction, abduction, deduction
8. Why Machine Learning? (6/10)
9. Why Machine Learning? (7/10)
-
-
-
- Previously: Learning in the limit
-
-
-
- Now: PAC (Probably Approximately Correct)
-
-
-
-
- Addresses efficiency constraints
-
-
-
-
- Best cases analysis (Helpful Teacher Model)
-
-
-
-
- Average case analysis (constraining assumption)
-
-
- Empirical:When mathematical analysis isn't obvious
-
-
-
- Goal: Model human learning behaviour
-
-
-
- Method: Comparison with subject data
10. Why Machine Learning? (8/10)
- Knowledge-Poor Supervised Learning
-
- Given:A training set of annotated instances
-
- To Induce : A hypothesis (concept description)
- Knowledge-Intensive Supervised Learning
-
- Given :A set of training instances + a hypothesis of the target
concept +background knowledge
-
- To Induce:A modified hypothesis (concept description)that is
consistent with the domain theory & the training instances
- Unsupervised learning: clustering
-
- Given: A set of unclassified instancesI Have not any special
target attribute
-
- To Do:Create a set of clusters forIaccordingto their presumed
classes Clusters need not to be disjoint Clusters can be
hierarchically related
11. Why Machine Learning? (9/10)
- Paradigms knowledge-poor supervised learning:
-
- Decision tree (ID3, TIDT)
- Paradigms knowledge-intensive supervised learning:
-
- Explanation based learning
-
- Inductive Logic Programming
12. Why Machine Learning? (10/10)
- Importance: How can computers be programmed that they
'learn'
- Machine learningnatural learning
-
- Data mining: automatic detection of regularity in big amounts
of data
-
- Implementation of software, which cannot be easily programmed
by hand
-
- Self adaptive programs: programs for playing
- Theoretical results:Connection among the number of training
examples, the hypothesis and the expected error
13. How can the learning problem be defined
- Definition:A computer program is said tolearnfrom
experienceEwith respect to some class of tasksTand performance
measureP , if its performance at tasks inT , as measured
byPimproves with experienceE
- Example: Learning to play checker
-
- TaskT : design a program to learn to play checker
-
- Performance measureP : The percentage of the games won
-
- ExperienceE : Playing against itself
14. Content
- How can the learning problem be defined
-
- Choosing the training experience
-
- Choosing the target function
-
- Choosing the representation of the target function
-
- Choosing a function approximation algorithm
- Designing a learning system: learning to play checker
- Perspectives and questions in ML
15. Choosing the Training Experience (1/2)
- What experience is provided
-
- Direct or indirect feedback regarding the choices executed by
the system
-
-
- Direct:Individual checker board states and the correct move for
each
-
-
- Indirect:move sequences and final
outcomesProblem:determiningthe degree to which each move in the
sequence deserves credit or blame for the final outcome ( credit
assignment )
-
- The rate of the controls of the sequence of the training
examples by the learning system
-
-
- Theteacher selectsinformative board states and provides the
correct move for each
-
-
- Thelearnermight itselfpropose board statesthat it finds
particularly confusing and ask the teacher for the correct
move
-
-
- The learner may have complete control over both the board
states and the (indirect) training classification, as it does when
it learns playing against itself with no teacher
16. Choosing the Training Experience (2/2)
-
- How well does it represent the distribution of examples over
which the final system performancePmust be measured Problem:The
distribution of the training examplesis identical to the
distribution of the test examples
-
- A checkers learning problem:
-
-
- Performance measure P : percentage of games won in the world
tournament
-
-
- Training experience E : games played against itself
17. Choosing the Target Function (1/2)
- What type of knowledge will be learned and howwill thisbe used
by the performaning program
-
- Example:The program needs to learn how to choose the best move
from any board state
-
- ChooseMove: B:the set of legal board stateM:the set of legal
moves
-
- Problem:difficult to learn if only the kind of indirect
training experience is available to our system=> B : the set of
legal board states: some real value
18. Choosing the Target Function (2/2)
- Question:Definition of the target functionV :
-
- Ifbisa final board state that is won, then
-
- Ifbis a final board state that is lost, then
-
- Ifb is a final board state that is drawn, then
-
- Ifbis not a final state in the game, thenwhereb ' is the best
final board state that can be achievedstarting fromband playing
optimally until the end of the game (assuming the opponent plays
optimally as well).
- Problem:While this definition specifies a value ofV(b)for every
board stateb recursively , this definition is not usable by our
checker's player because it isnot efficiently computable
- Solution:Discovering anoperational descriptionof the ideal
target functionV ,Difficult=>learning some approximation
19. Choosing a Function ApproximationAlgorithm (1/2)
-
- For any given board state, the functionwill be calculated as a
linear combination of weights
-
-
- bp(p):the number of black pieces on the board
-
-
- rp(b):the number of red pieces on the board
-
-
- bk(b): the number of black kings on the board
-
-
- rk(b):the number of red kings on the board
-
-
- bt(b):the number of black pieces threatened by red (i.e., which
can be captured on red's next turn)
-
-
- rt(b): the number of red pieces threatened by black
20. Choosing a Function ApproximationAlgorithm (2/2)
- Partial design of a checker learning program:
-
- Performance measure P:percentage of games won in the world
tournament
-
- Training experience E:games played against itself
-
- Target function representation :
21. Choosing a Function Approximation Algorithm: Estimating
Training Values
- How to assign training values to the more
numerousintermediateboard states?
-
- Approach:assign the training value offor any intermediate board
statebto be,whereis the learner's current approximation
toVandwhereSuccessor(b)denotes the next board state followingb for
which it is again the program's turn to move.
-
- Rule for estimating the training values:
22. Choosing a Function Approximation Algorithm: Adjusting the
Weights
- LMS Weight update rule(choosing the weightsto best fit the set
of training examples)
-
- Best fit:minimise the squared error E between the training
values and the valuespredicted by the hypothesis:
-
- For each training example
-
-
- Use the current weights to calculate:
-
-
- For eachupdate cis a small constantthat moderates the size
weight update.
23. Some Issues in Machine Learning
- What algorithms can approximate functions well (and when?)
- How doesthe number of training examples influence the
accuracy?
- How does the complexity of the hypothesis representation impact
it?
- How does noisy data influence the accuracy?
- What are the theoretical limits of learnability?
- How can prior knowledge of the learner help?
- What clues can we get from abiological learning system?
- How can systems alter their own representation?
24. Summary
- Goal:Building computer programs that improve their
performanceat some task through experience
-
- Data Mining:discover automaticallyimplicit regularities in
large data sets
-
- Poorly understood domains where humans might not have the
knowledge needed to develop effective algorithms
-
- Domains where the program must dynamically adapt to changing
conditions
- ML draws on ideas from several sets of disciplines, including
artificial intelligence, probability and statistics, computational
complexity information theory, psychology and neurobiology, control
theory and philosophy
- Well defined learning problem =well specified task+ performance
metric + source of training examples