This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...

• This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al.

• Bayes’ rule:

• Pr is probability

• Equation means: “How does the probability of an item being a member of group change, given evidence x”

Bayesian Discriminant Analysis

Prior probabilityThis can be a problem!

Bayesian Discriminant Analysis• Bayes’ rule can be turned into a classification

=> Choose group 1

*If priors are both 0.5, decision boundaries are

where curves cross

• If the data is multivariate normal drawn from the same population, the decision rule becomes:

Bayes-Gaussian Discriminant Analysis

slope intercept

with the “distance” defined as:

• Note that if the data is just 1D this is just an equation for a line:

Like an average cov mat

• If the data is multivariate normal but drawn from different populations, the decision rule is the same but the “decision distance” becomes:

• Note that if the data is just 1D this is an equation for a parabola:

New quadratic term

• The “quadratic” version is always called quadratic discriminant analysis, QDA

• The “linear” version is called by a number of names!• linear discriminant analysis, LDA

• Some combination of of the above with the words, Gaussian or classification

• A number of techniques use the name LDA!• Important to specify the equations used to tell the

difference!

Groups have similar covariance structure:linear discriminant rule should work well

Groups have different covariance structure:quadratic discriminant rule may work better

• This supervised technique is called Linear Discriminant Analysis (LDA) in R• Also called Fisher linear discriminant analysis

• CVA is closely related to linear Bayes-Gaussian discriminant analysis

• Works on a principle similar to PCA: Look for “interesting directions in data space”• CVA: Find directions in space which best separate

groups.• Technically: find directions which maximize ratio of

between group to within variation

Canonical Variate Analysis

Project on PC1:Not necessarily good group separation!

Project on CV1:Good group separation!

Note: There are #groups -1 or p CVswhich ever is smaller

• Use between-group to within-group covariance matrix, W-1B to find directions of best group separation (CVA loadings, Acv):

• CVA can be used for dimension reduction.

• Caution! These “dimensions” are not at right angles (i.e. not orthogonal)

• CVA plots can thus be distorted from reality

• Always check loading angles!

• Caution! CVA will not work well with very correlated data

2D CVA of gasoline data set: 2D PCA of gasoline data set:

• Distance metric used in CVA to assign group i.d. of an unknown data point:

• If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification.

• PLS-DA is a supervised discrimination technique and very popular in chemometrics• Works well with highly correlated variables (like in

spectroscopy)• Lots of correlation causes CVA to fail!

• Group labels coded into a “response matrix” Y• PLS searches for directions of maximum covariance in X and Y.

• Loading for X can be used like PCA loading• Dimension reduction

• Loading plots

Partial Least Squares Discriminant Analysis

Partial Least Squares Discriminant Analysis• PLS-DA theory: Find an (approximate) linear

relationship between experimental (explanatory) variables and group labels (response variables)• Y=XB+E

• X=TPT+EX

• Y=UQT+EY

• So substituting: UQT=TPTB+E

exp. vars.lbls.

“error” or “residuals” matrix

PLS-scores PLS-loadings

*Use these “Y-scores” with a “soft-max” or “Bayes” to pick “most-likely” group label

Partial Least Squares Discriminant Analysis• How do we solve this for T, P and U??• Objective: maximize covariance between X and Y scores, T

and U.

• Various procedure to do this:

• Kernel-PLS

• SIMPLS

• NIPLS

• Give close, but slightly different numerical results

• In R, functions are:

• plsr (pls package)

• spls (spls package)

• Easiest: plsda (caret pakage)

2D PLS of gasoline data set: 2D PCA of gasoline data set:

• Group assignments of observation vectors are made by interpreting Y scores.• Typically “soft-max” function is used.

Observation Vectors

This supervised learning technique uses Bayes’ rule but is different in philosophy from the well...

Documents

Bayes’ Rule and LTP CSE 312 Summer 21 Lecture 6

Human-Oriented Robotics Probability Refreshersrl.informatik.uni-freiburg.de/teachingdir/ws14/... · Bayes’ Rule • Each term in Bayes’ rule has a name • The posterior represents

Classification: Naïve Bayes - University of Belgradeai.fon.bg.ac.rs/wp-content/uploads/2016/10/Naive-Bayes-Labs-2016.pdf · Naive Bayes classifier • Based on the Bayes rule •

Lecture 3 - Probability and Bayes' rule › ... › Lectures › Lecture3.pdf · Lecture 3 - Probability and Bayes’ rule DD2427 March 22, 2013. When probabilities are used Probabilities

Text mining - from Bayes rule to dependency parsing

Probability Theory, Bayes’ Rule & Random Variable Lecture 6

Lec05 Product Rule and Bayes' Rule

Teaching an Application of Bayes’ Rule for Legal Decision ...jse.amstat.org/v22n1/satake.pdf · Bayes' Rule, and utilize Bayes’ Factor in the computation process to further promote

CPSC340 Bayes rule

QR 38 4/10 and 4/12/07 Bayes’ Theorem I. Bayes’ Rule II. Updating beliefs in deterrence

Bayesian inference, Naïve Bayes modelslazebni.cs.illinois.edu/fall17/lec13_bayesian_inference.pdf · Bayes Rule • The product rule gives us two ways to factor a joint probability:

A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers

Bayes Rule and Probability - crowley-coutaz.frcrowley-coutaz.fr/jlc/Courses/2017/ENSI2.SIRR/ENSI2.SIRR...Bayes’ Rule and Probability Lesson 3 3-7 Then probability distribution table

Bayes Rule and Conditional Probability · Bayes Rule and Conditional Probability Baye's rule provides a unifying framework for pattern recognition and for reasoning about hypotheses

Lecture 1: Bayes' Rule Information Cascades · Lecture 1: Bayes’ Rule Information Cascades Christoph Brunner April 19, 2012 1/72. Behavioral Game Theory and Experiments\r Jörg

CSE 415 -- (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying

can bayes' rule be justified by cognitive rationality principles - PSE

Bayesian inference, Naïve Bayes model - Svetlana Lazebnikslazebni.cs.illinois.edu/fall16/lec13_bayesian_inference.pdf · Bayesian inference, Naïve Bayes model ... Bayes Rule •

Quantum conditional states, bayes' rule, and state compatibility

Applications: Bayes Rule