25
1/14/03 1 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute

1/14/031 Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

1/14/03 1

Math Models for Learning and Discovery

Kristin P. BennettMathematical Sciences DepartmentRensselaer Polytechnic Institute

1/18/05 2

The Learning Problem

The problem of understanding intelligence is said to be the greatest problem in science today and “the” problem for this century – as deciphering the genetic code was for the second half of the last one…the problem of learning represents a gateway to understanding intelligence in man and machines.

-- Tomasso Poggio and Steven Smale 2003

1/18/05 3

What do these problems have in common?

Design and Discovery of PharmaceuticalsTarget Marketing in BusinessDiagnosis of Breast CancerDiscovery of Novel SuperconductorsDetection of Anthrax using TZ

spectroscopyModeling and predicting global tradeRNA Transcription

1/18/05 4RENSSELAER

DRUG TRIVIA (2000 old info)

• In USA $25B/yr for R&D of pharmaceuticals (33% clinicals)• Worth their weight in gold• 10-15 years from conception market for drug• Development cost 0.5B/drug• First-year sales > $1B/drug• 1 drug approved/5000 compounds tested• 1 out of 100 drugs succeeds to market• 19 Alzheimer’s drugs in development• 20,000,000 Americans with Alzheimer by 2050

1/18/05 5RENSSELAER

1/18/05 6

HIV Reverse-Transcriptase Inhibition modeling:

Have a few Molecules that have been tested:

Can we predict if new molecule will inhibit HIV?

TOWARDS TREATING THE HIV EPEDIMIC

N

NHN

X

R

R1 S

HN

N

O

O

OHO

R N N

O

O

R2

O OTBDMS

S

OO

OH2NTBDMSO

R1N O OTBDMS

S

OO

OH2NTBDMSO

NN

R1 R2N N

S

O N

O

R2R1

1/18/05 7

The bioactivities of a small set of molecules

Many Possible Descriptors for each molecules:

Molecular Weight

Electrostatic Potential

Ionization Potential

Can we predict molecules bioactivity?

What do we know?

1/18/05 8

Database Marketing

Bank has $1.7 billion portfolio of home mortgages.

When customer refinances, they may lose customer.

Questions will a customer refinance?

If so, offer that customer a good deal on refinancing.

1/18/05 9

What do we know?

For many customers, we know if they refinanced or not.

We know attributes of customer: Income Age Residential Area Payment History

Can we predict behavior of future customers?

1/18/05 10

Breast Cancer Diagnosis

Fine needle aspirate of breast tumor.

Is tumor benign or malignant?

1/18/05 11

What do we know?

For patients in initial study, we know whether tumor was benign or malignant.

Have a digital image of tumor aspirate.Know characteristics doctors look at:

Uniformity of cell shape Uniformity of cell size Cell Mitosis

1/18/05 12

What do we know?

For patients in initial study, we know whether tumor was benign or malignant.

Have a digital image of tumor aspirate.Know characteristics doctors look at:

Uniformity of cell shape Uniformity of cell size Cell Mitosis

1/18/05 13

Superconductivity

Superconductivity is the ability of a material to conduct current with no resistance and extremely low loss.

A few high temperature superconductors have been found.

What other compounds are superconductors?

1/18/05 14

Applications of Superconductivity:

Magnetic Resonance Imaging

1/18/05 15

Applications of Superconductivity

Maglev Trains

1/18/05 16

Applications of Superconductivity

Very small and efficient motorsBetter power transmission cablesBetter cellular phone service

Find a cheap high-temperature superconductor and you will get the NOBEL PRIZE.

1/18/05 17

What do we know?

Many compounds have been tested to see if they are superconductors.

Many descriptors exists for these compounds based on molecular properties.

1/18/05 18

What do all these problems have in common?

Each problemCan be posed as a “yes” or “no”

question.Has examples known to be of the

“yes” type or the “no” type.Each example has an associated set

of descriptors.Learn Classification Function !

1/18/05 19

Data Mining

Each problem has data.Our job is to “mine” information from

this data.Information depends on the question

asked.In this case we must produce a

predictive yes/no model (a.k.a. a classification model) based on the data.

1/18/05 20

Mathematical Model

Have data

Construct predictive function f(x)ySolve mathematical model to find f

Want f to generalize well on future data

1 1( , ), , ( , )m mx y x y

2

2min ( )

m

f i i Ki

f x y f

1/18/05 21

Types of Learning Problems

Classification

Regression

Clustering

Ranking

1 or 1iy

iy R

unknowniy

1 2 , ,k jy y y y

1/18/05 22

Data Mining

Classification = yes/no modelsStart with examples of yes and no.Associate a set of descriptors with each

example. Descriptors must be appropriate for the question you are asking.

Construct a model to split the two setsUse the model to predict new examples.

1/18/05 23

Learning Model

What kind of learning task is it? What sort of f should we use?

Kernel function What loss function to use? What regularization function? How can we solve this learning model? How well will the model predict new points?

( ) ( , )i ii

f x K x x

1/18/05 24

Class information

See course web page http://

www.rpi.edu/~bennek/class/mmld/index.htm

1/18/05 25

Assignment for Friday

Read and be prepared to discuss Chapter 1, Shaw-Taylor and

CristianiniLecturer: Gautam Kunapuli