… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo

COMBINING BASE LEARNERS

Algo 1 Algo 2 Algo 3 Algo N

Meta-Learning Algo

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

RATİONALE

No Free Lunch theorem: There is no algorithm that is always the most accurate.

Generate a group of algorithms which when combined display higher accuracy.

DATA AND ALGO VARIATION

Different Algorithms

Different Datasets

Bagging

• Bagging can easily be extended to regression.

• Bagging is most efficient when the base-learner is unstable.

• Bagging typically increases accuracy.

• Interpretability is lost.

“Breiman's work bridged the gap between statisticians and computer scientists, particularly in the field of machine learning.Perhaps his most important contributions were his work on classification and regression trees and ensembles of trees fit to bootstrap samples.

Bootstrap aggregation was given the name bagging by Breiman. Another of Breiman's ensemble approaches is the random forest.” (Extracted from Wikipedia).

Boosting

• Boosting tries to combine weak learners into a strong learner.

• Originally all examples have the same weight, but in following iterations examples wrongly classified increase their weight.

• Boosting can be applied to any learner.

BoostingInitialize all weights wi to 1/N (N: no. of examples)error = 0Repeat (until error > 0.5 or max. iterations reached)

Train classifier and get hypothesis ht(x)

Compute error as the sum of weights for misclassified exs. error = Σ wi if wi is incorrectly classified.

Set αt = log ( 1-error / error )

Updates weights wi = [ wi e - (αt yi ht(xi) ] / Z

Output f(x) = sign ( Σt αt ht(x) )

Boosting

Misclassified Example Increase Weights

DATA AND ALGO VARIATION

Different Algorithms

Different Datasets

MİXTURE OF EXPERTSVoting where weights are input-dependent (gating)

(Jacobs et al., 1991)Experts or gating can be nonlinear

MİXTURE OF EXPERTS

Robert JacobsUniversity of Rochester

Michael JordanUC Berkeley

Stacking

• Variations are among learners.

• The predictions of the base learners form a new meta-dataset.

• A testing example is first transformed into a new meta-example and then classified.

• Several variations have been proposed around stacking.

Cascade Generalization

• Variations are among learners.

• Classifiers are used in sequence rather than in parallel as in stacking.

• The prediction of the first classifier is added to the example feature vector to form an extended dataset.

• The process can go on through many iterations.

Cascading

• Like boosting, distribution changes across datasets.

• But unlike boosting we will vary the classifiers.

• Classification is based on prediction confidence.

• Cascading creates rules that account for most instances catching exceptions at the final step.

ERROR-CORRECTİNG OUTPUT CODES

110100101010

011001000111

K classes; L problems (Dietterich and Bakiri, 1995) Code matrix W codes classes in terms of learners

One per classL=K

PairwiseL=K(K-1)/2

1111111111111111

Full code L=2(K-1)-1

With reasonable L, find W such that the Hamming distance between rows and columns is maximized.

1111111111111111111111111111

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo

Documents

Meta Reinforcement Learning

Reconciling meta-learning and continual learning with online mixtures of taskspapers.nips.cc/paper/9112-reconciling-meta-learning-and... · 2020. 2. 13. · Reconciling meta-learning

Meta Learning (Part 1) - NTU Speech Processing …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2019/Lecture...Meta Learning (Part 1) Hung-yi Lee Introduction •Meta learning = Learn to

META-LEARNING RUNGE-KUTTA

Unsupervised Meta-Learning for Reinforcement Learning · are drawn from the same distribution as the meta-training tasks [8]. In effect, meta-reinforcement learning ofﬂoads some

Meta-Learning - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_16_meta... · Meta-learning summary & open problems •The promise of meta-learning:

Learning to Learn: Meta-learning Strategies

meta learning sample.pdf

Meta-Gradient Reinforcement Learning - NIPS

Contrasting Meta-learning and Hyper-heuristic Research ... · Contrasting Meta-learning and Hyper-heuristic Research 3 [11]. Later, meta-learning developed other research branches,

Meta-Learning : towards universal learning paradigms

Meta-Learning for User Cold-Start Recommendation · RS, by applying model-agnostic meta learning (MAML) [11]. MAML is a general and very powerful technique proposed for meta-learning

Modular meta-learning

Meta-Learning for Contextual Bandit Exploration · Meta-Learning for Contextual Bandit Exploration Amr Sharaf1 Hal Daume III´ 1 2 Abstract We describe MELˆ EE´ , a meta-learning

META LEARNING CURIOSITY ALGORITHMS

Learning Quickly to Plan Quickly Using Modular Meta-Learning

Robust Deep Learning Based on Meta-learning

Meta-Learningrll.berkeley.edu/.../f17docs/lecture_16_meta_learning.pdf•Meta-learning = learning to learn •In practice, very closely related to multi-task learning •Many formulations

TL by Kernel Meta Learning

Meta Learning for Control - University of California, Berkeley · 1.1 Reinforcement Learning 1 1.2 Deep Learning 2 1.3 Deep Reinforcement Learning 2 1.4 Meta Learning 3 1.4.1 Meta