Bayesian Machine learning and its application

Alan QiFeb. 23, 2009

Motivation

• massive data from various sources: web pages, facebook, high-throughput biological data, high-throughput chemical data, etc.

• Challenging goal: how to model complex systems and extract knowledge from data.

Bayesian machine learning

Bayesian learning method

Principled way to fuse prior knowledge and new evidence in data

Key issues Model Design

Computation

Wide-range applications

Bayesian learning in practice

Applications:

Recommendation systems (Amazon, NetFlix)

Text Parsing (Finding latent topics in documents)

Systems biology (where computations meets biology)

Computer vision (parsing handwritten diagram automatically)

Wireless communications

Computational finance ....

Learning for biology: understanding gene regulation during organism development

Protein, product of Gene B

Gene A

Learning functionalities of genes for development

Inferring high-resolution protein-DNA binding locations from low-resolution measurement

Learning regulatory cascades during embryonic stem cell development

time timetime

Wild-type lineageNo C lineage Extra ‘C’ lineages

time timetime

Data: gene expression profiles from wide-types & mutants

(Baugh et al, 2005)

Bayesian semisupervised classification for finding tissue-specific genes

BGEN: (Bayesian GENeralization from examples, Qi et al., Bioinformatics 2006)

Labeledexpression

Classifier

Graph-based kernels

(F. Chung, 1997, Zhu et al., 2003, Zhou et al. 2004)

Gaussian process classifier that is trained by EP and classifies the whole genome efficiently

Estimating noise and probe quality by approximate leave-one-out error

Gene expression

Biological experiments support our predictions

CNon C

MuscleEpidermis

CNon C

MuscleEpidermis

K01A2.5

R11A5.4 Ge’s lab

Data: genomic sequences

RNA: messager

Consensus SequencesUseful for publication

IUPAC symbols for degenerate sites

Not very amenable to computation

Nature Biotechnology 24, 423 - 425 (2006)

Probabilistic Model

.7.2.2.1.3

.1.2.4.5.4

.1.2.2.2.2

.1.4.1.2.1ACGT

M1 MKM1

Pk(S|M)

Position FrequencyMatrix (PFM)

1 K Count frequenciesAdd pseudocounts

Bayesian learning: Estimating motif models by Gibbs sampling

Parameter1 Parameter2

In theory, Gibbs Sampling less likely to get stuck a local maxima

Bayesian learning: Estimating motif models by expectation maximization

Parameter1 Parameter2

To minimize the effects of local maxima, you should searchmultiple times from different starting points

Scoring A Sequence

( | ) ( | )( | )log log log

( | ) ( | ) ( | )

N Ni i i i

ii i i

P S PFM P S PFMP S PFMScore

P S B P S B P S B

To score a sequence, we compare to a null model

A: 0.25

T: 0.25

G: 0.25

C: 0.25

A: 0.25

T: 0.25

G: 0.25

C: 0.25

Background DNA (B)

.7.2.2.1.3

.1.2.4.5.4

.1.2.2.2.2

.1.4.1.2.1ACGT

Log likelihoodratio

1.4-0.3-0.3-1.30.3

-1.3-0.30.610.6

-1.3-0.30.3-0.3-0.3

-1.30.6-1.3-0.3-1.3ACGT

Position WeightMatrix (PWM)

Scoring a Sequence

MacIsaac & Fraenkel (2006) PLoS Comp Bio

Common threshold = 60% of maximum score

Visualizing Motifs – Motif LogosRepresent both base frequency and conservation at each position

Height of letter proportionalto frequency of base at that position

Height of stack proportionalto conservation at that position

Software implemenation: AlignACE

http://atlas.med.harvard.edu/cgi-bin/alignace.pl

• Implements Gibbs sampling for motif discovery– Several enhancements

• ScanAce – look for motifs in a sequence given a model

• CompareAce – calculate “similarity” between two motifs (i.e. for clustering motifs)

Data: biological networks

Network Decomposition

• Infinite Non-negative Matrix Factorization

1. Formulate the discovery of network legos as a non-negative factorization problem

2. Develop a novel Bayesian model which automatically learns the number of the bases.

•Synthetic Network Decomposition

Data: Movie rating

• User-item Matrix of Ratings

• Recommend: 5 • Not Recommend: 1

Task: how to predict user preference

• “Based on the premise that people looking for information should be able to make use of what others have already found and evaluated.” (Maltz & Ehrlich, 1995)

• E.g., if you like movies A, B, C, D, and E. And I like A, B, C, D but have not seen E yet. What would be my possible rating on E?

Collaborative filtering for recommendation systems

• Matrix factorization as an collaborative filtering approach:

X ≈ Z A where X is N by D, Z is N by K and A is K by D.

xi,j: user i’s rating on movie j

zi,k: user i’s interests in movie category k (e.g., action, thriller, comedy, romance, etc.)

Ak,j: how likely movie j belong to movie category k

Such that xi,j ≈ zi,1 A1,j + zi,2 A22,j + … + zi,K AK,j

Bayesian learning of matrix factorization

• Training: Use probability theory, in particular, Bayeisan inference, to learn the model parameters Z, A given data X, which contains missing elements, i.e., unknown ratings

• Prediction: use estimated Z and A to predict unkown ratings in X

Test resutls

• ‘Jester’ dataset: • Map from [-10,10] to [0,20]• 10 random chosen datasets, each with 1000

users. For each user we randomly hold out 10 ratings for testing

• IMF, INMF and NMF(K=2…9)

Collaborative Filtering

• How to find latent topics and group documents, such as emails, papers, or news into different clusters?

Data: text documents

Computer science papers Biology papers

Assumptions

1. The keywords are shared in different documents of one topic.

2. The more important the keyword is, the more frequent it appears.

Matrix factorization models (again)

X = Z A

xi,j: the frequency word j appears in document zi,k: how much content in document i is related to topic k (e.g., biology, computer science, etc.)

Ak,j: how important word j to topic k

Bayesian Matrix Factorization

• We will use Bayesian methods again to estimate Z and A.

• Once we can identify hidden topics by examining A and cluster documents.

Text Clustering

• ‘20 newsgroup’ dataset• A subset of 815 articles and 477 words.

Discovered hidden topics

Summary

• Bayesian machine learning: A powerful tool enables computers to learn hidden relations from massive data and make sensible predictions.

• Applications in computational biology, e.g., gene expression analysis and motif discovery, and information extraction, e.g., text modeling.

Bayesian Machine learning and its application

Documents

Sparse Bayesian Extreme Learning Machine for Multi

A Bayesian Committee Machine

Nonparametric Bayesian Methods in Machine Learning ...mlg.eng.cam.ac.uk/zoubin/talks/sheffield04.pdf · Nonparametric Bayesian Methods in Machine Learning: Gaussian Processes, Dirichlet

Bayesian machine learning: a tutorial · Bayesian machine learning: a tutorial R emi Bardenet CNRS & CRIStAL, Univ. Lille, France R emi Bardenet (CNRS & Univ. Lille)Bayesian ML1

Machine-learning-assisted fabrication: Bayesian optimization of …larsko/papers/wahab_machine... · 2020. 8. 18. · Machine-learning-assisted fabrication: Bayesian optimization

Machine Learning Probabilistic Machine Learning · Machine Learning Probabilistic Machine Learning learning as inference, Bayesian Kernel Ridge regression = Gaussian Processes, Bayesian

Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing ＋ Machine learning ＋ Application . Bayesian

Bayesian Models for Machine Learning - Columbia Universityjwp2128/Teaching/E6720/Bayesian... · Course Notes for Bayesian Models for Machine Learning John Paisley Department of Electrical

An Introduction to Bayesian Machine Learning€¦ · An Introduction to Bayesian Machine Learning Model-based Machine Learning We design aprobabilistic modelwhichexplainshow the data

Bayesian Inference in Machine Learning

Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application

Machine Learning - Bayesian Decision Theory and …vishy/fall2016/notes/BayesianDecision.pdf · Machine Learning Bayesian Decision Theory and Classi cation S.V:N. (vishy) Vishwanathan

Machine Learning - A Bayesian and Optimization …skara.di.uoa.gr/PP/transparency/Chapter2slides.pdf · Machine Learning A Bayesian and Optimization Perspective Academic Press, 2015

ST451 - Lent term Bayesian Machine Learning

Sparse Bayesian Learning and the Relevance Vector Machine

Using Bayesian Optimization to Tune Machine Learning Models

Bayesian Machine learning and its application Alan Qi Feb. 23, 2009

Application of mathematical and machine learning ... · movements were explored using machine learning approaches. Two different techniques of machine-learning were used (Bayesian

Machine Learning EE514 CS535 Bayesian Learning: MAP and …

BAYESIAN MACHINE LEARNING - Supélecsirien.metz.supelec.fr/depot/SIR/CoursML/Fred/1-Introduction.pdf · Bayesian, Machine Learning, Frederic Pennerath Syllabus 1. Bayesian estimation: