vorl1.ppt

1. Machine learningOverview PD. Dr. Gabriella Kkai [email_address] Friedrich-Alexander-Universitt Lehrstuhl fr Informatik 2 Raum 04.131 Tel: 8528996

Why Machine Learning?

How can a learning problem be defined

Designing a learning system: learning to play checker

Perspectives and questions in ML

Summary

Webster 's definition of 'learn'

'To gain knowledge, or understanding of, or skill in by study instruction or experience

Simons' definition (Machine Learning I, 1993, Chapter 2.)

'Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasksdrawn from the same population more effectively the next time

Donald Michie's Definition (Computer Journal 1991)

'A learning system uses sample data to generate an update basis for improved (performance) on subsequent data from the same source and express the new basis in intelligible symbolic form'

Machine learning istypically thought of as a sup-topic of artificial intelligence.

It is inspired by several disciplines

Relevant topics:

ArtificialIntelligence:Learning: Learning symbolic representation of concepts, ML as search problem , Prior knowledge + training examples guide the learning-process

BayesianMethods:Calculatingprobabilitiesof the hypotheses, Bayesian-classifier

Theory ofthe computational complexity:Theoretical boundsof the complexity for different learning task measured in the terms ofthe computational effort,number of differenttraining examples, the number ofmistakesrequired in order to learn

Informationtheory:Measurement of the entropy, minimal description length, optimal codes and their relationship to optimal training sequences for encoding a hypothesis

Philosophy:Occam's razor suggesting the simpliest hypothesis is the best

Psychology and Neurobiology:Motivation of NN the power law of the practice

Statistics : Characterisation of the errors (e.g. bias,variance),thatoccur when estimating the accuracy of hypothesis based, confidence interval, statistical tests

Goal: Description of the different learning paradigms, the algorithms, the theoretical results and applications

Dimension: Constraints

Task/objective

Learning task

Performance task

Availability of the background knowledge

Encoded

Interactive

Availability of data

Incremental vs. batch

Passive vs.active

Characteristics of the data

Static vs. drifting

Propositional or first-order

Dimension: Approach

Search mechanism

Top-Down (model driven)

Bottom-up (data driven)

Many others

Reasoning methods

Induction, abduction, deduction

Deductive Reasoning:

Inductive Reasoning:

Abductive Reasoning:

Evaluation Methodologies

Mathematical

Previously: Learning in the limit

Now: PAC (Probably Approximately Correct)

More tolerant

Addresses efficiency constraints

Recent:

Best cases analysis (Helpful Teacher Model)

Average case analysis (constraining assumption)

Empirical:When mathematical analysis isn't obvious

Popular

Data intensive

Psychological

Goal: Model human learning behaviour

Method: Comparison with subject data

Knowledge-Poor Supervised Learning

Given:A training set of annotated instances

To Induce : A hypothesis (concept description)

Knowledge-Intensive Supervised Learning

Given :A set of training instances + a hypothesis of the target concept +background knowledge

To Induce:A modified hypothesis (concept description)that is consistent with the domain theory & the training instances

Unsupervised learning: clustering

Given: A set of unclassified instancesI Have not any special target attribute

To Do:Create a set of clusters forIaccordingto their presumed classes Clusters need not to be disjoint Clusters can be hierarchically related

Paradigms knowledge-poor supervised learning:

Concept learning

Decision tree (ID3, TIDT)

Rule based

Lazy learning

Genetic algorithms

Neural networks

Bayesian networks

Paradigms knowledge-intensive supervised learning:

Explanation based learning

Inductive Logic Programming

Unsupervised learning

Bayesian learning

Clustering

Importance: How can computers be programmed that they 'learn'

Machine learningnatural learning

Application areas

Data mining: automatic detection of regularity in big amounts of data

Implementation of software, which cannot be easily programmed by hand

Self adaptive programs: programs for playing

Theoretical results:Connection among the number of training examples, the hypothesis and the expected error

Biological studies

Definition:A computer program is said tolearnfrom experienceEwith respect to some class of tasksTand performance measureP , if its performance at tasks inT , as measured byPimproves with experienceE

Example: Learning to play checker

TaskT : design a program to learn to play checker

Performance measureP : The percentage of the games won

ExperienceE : Playing against itself

Why Machine Learning?

How can the learning problem be defined

Choosing the training experience

Choosing the target function

Choosing the representation of the target function

Choosing a function approximation algorithm

Designing a learning system: learning to play checker

Perspectives and questions in ML

Summary

What experience is provided

Direct or indirect feedback regarding the choices executed by the system

Direct:Individual checker board states and the correct move for each

Indirect:move sequences and final outcomesProblem:determiningthe degree to which each move in the sequence deserves credit or blame for the final outcome ( credit assignment )

The rate of the controls of the sequence of the training examples by the learning system

Theteacher selectsinformative board states and provides the correct move for each

Thelearnermight itselfpropose board statesthat it finds particularly confusing and ask the teacher for the correct move

The learner may have complete control over both the board states and the (indirect) training classification, as it does when it learns playing against itself with no teacher

How well does it represent the distribution of examples over which the final system performancePmust be measured Problem:The distribution of the training examplesis identical to the distribution of the test examples

A checkers learning problem:

Task T : playing checker

Performance measure P : percentage of games won in the world tournament

Training experience E : games played against itself

What type of knowledge will be learned and howwill thisbe used by the performaning program

Example:The program needs to learn how to choose the best move from any board state

ChooseMove: B:the set of legal board stateM:the set of legal moves

Problem:difficult to learn if only the kind of indirect training experience is available to our system=> B : the set of legal board states: some real value

Question:Definition of the target functionV :

Ifbisa final board state that is won, then

Ifbis a final board state that is lost, then

Ifb is a final board state that is drawn, then

Ifbis not a final state in the game, thenwhereb ' is the best final board state that can be achievedstarting fromband playing optimally until the end of the game (assuming the opponent plays optimally as well).

Problem:While this definition specifies a value ofV(b)for every board stateb recursively , this definition is not usable by our checker's player because it isnot efficiently computable

Solution:Discovering anoperational descriptionof the ideal target functionV ,Difficult=>learning some approximation

How canbe represented?

For any given board state, the functionwill be calculated as a linear combination of weights

bp(p):the number of black pieces on the board

rp(b):the number of red pieces on the board

bk(b): the number of black kings on the board

rk(b):the number of red kings on the board

bt(b):the number of black pieces threatened by red (i.e., which can be captured on red's next turn)

rt(b): the number of red pieces threatened by black

Partial design of a checker learning program:

Task T:playing checker

Performance measure P:percentage of games won in the world tournament

Training experience E:games played against itself

Target function

Target function representation :

How to assign training values to the more numerousintermediateboard states?

Approach:assign the training value offor any intermediate board statebto be,whereis the learner's current approximation toVandwhereSuccessor(b)denotes the next board state followingb for which it is again the program's turn to move.

Rule for estimating the training values:

LMS Weight update rule(choosing the weightsto best fit the set of training examples)

Best fit:minimise the squared error E between the training values and the valuespredicted by the hypothesis:

For each training example

Use the current weights to calculate:

For eachupdate cis a small constantthat moderates the size weight update.

What algorithms can approximate functions well (and when?)

How doesthe number of training examples influence the accuracy?

How does the complexity of the hypothesis representation impact it?

How does noisy data influence the accuracy?

What are the theoretical limits of learnability?

How can prior knowledge of the learner help?

What clues can we get from abiological learning system?

How can systems alter their own representation?

Goal:Building computer programs that improve their performanceat some task through experience

Application domain:

Data Mining:discover automaticallyimplicit regularities in large data sets

Poorly understood domains where humans might not have the knowledge needed to develop effective algorithms

Domains where the program must dynamically adapt to changing conditions

ML draws on ideas from several sets of disciplines, including artificial intelligence, probability and statistics, computational complexity information theory, psychology and neurobiology, control theory and philosophy

Well defined learning problem =well specified task+ performance metric + source of training examples

Documents

vorl1.ppt