View
214
Download
0
Tags:
Embed Size (px)
Citation preview
1/14/03 1
Math Models for Learning and Discovery
Kristin P. BennettMathematical Sciences DepartmentRensselaer Polytechnic Institute
1/18/05 2
The Learning Problem
The problem of understanding intelligence is said to be the greatest problem in science today and “the” problem for this century – as deciphering the genetic code was for the second half of the last one…the problem of learning represents a gateway to understanding intelligence in man and machines.
-- Tomasso Poggio and Steven Smale 2003
1/18/05 3
What do these problems have in common?
Design and Discovery of PharmaceuticalsTarget Marketing in BusinessDiagnosis of Breast CancerDiscovery of Novel SuperconductorsDetection of Anthrax using TZ
spectroscopyModeling and predicting global tradeRNA Transcription
1/18/05 4RENSSELAER
DRUG TRIVIA (2000 old info)
• In USA $25B/yr for R&D of pharmaceuticals (33% clinicals)• Worth their weight in gold• 10-15 years from conception market for drug• Development cost 0.5B/drug• First-year sales > $1B/drug• 1 drug approved/5000 compounds tested• 1 out of 100 drugs succeeds to market• 19 Alzheimer’s drugs in development• 20,000,000 Americans with Alzheimer by 2050
1/18/05 6
HIV Reverse-Transcriptase Inhibition modeling:
Have a few Molecules that have been tested:
Can we predict if new molecule will inhibit HIV?
TOWARDS TREATING THE HIV EPEDIMIC
N
NHN
X
R
R1 S
HN
N
O
O
OHO
R N N
O
O
R2
O OTBDMS
S
OO
OH2NTBDMSO
R1N O OTBDMS
S
OO
OH2NTBDMSO
NN
R1 R2N N
S
O N
O
R2R1
1/18/05 7
The bioactivities of a small set of molecules
Many Possible Descriptors for each molecules:
Molecular Weight
Electrostatic Potential
Ionization Potential
Can we predict molecules bioactivity?
What do we know?
1/18/05 8
Database Marketing
Bank has $1.7 billion portfolio of home mortgages.
When customer refinances, they may lose customer.
Questions will a customer refinance?
If so, offer that customer a good deal on refinancing.
1/18/05 9
What do we know?
For many customers, we know if they refinanced or not.
We know attributes of customer: Income Age Residential Area Payment History
Can we predict behavior of future customers?
1/18/05 10
Breast Cancer Diagnosis
Fine needle aspirate of breast tumor.
Is tumor benign or malignant?
1/18/05 11
What do we know?
For patients in initial study, we know whether tumor was benign or malignant.
Have a digital image of tumor aspirate.Know characteristics doctors look at:
Uniformity of cell shape Uniformity of cell size Cell Mitosis
1/18/05 12
What do we know?
For patients in initial study, we know whether tumor was benign or malignant.
Have a digital image of tumor aspirate.Know characteristics doctors look at:
Uniformity of cell shape Uniformity of cell size Cell Mitosis
1/18/05 13
Superconductivity
Superconductivity is the ability of a material to conduct current with no resistance and extremely low loss.
A few high temperature superconductors have been found.
What other compounds are superconductors?
1/18/05 16
Applications of Superconductivity
Very small and efficient motorsBetter power transmission cablesBetter cellular phone service
Find a cheap high-temperature superconductor and you will get the NOBEL PRIZE.
1/18/05 17
What do we know?
Many compounds have been tested to see if they are superconductors.
Many descriptors exists for these compounds based on molecular properties.
1/18/05 18
What do all these problems have in common?
Each problemCan be posed as a “yes” or “no”
question.Has examples known to be of the
“yes” type or the “no” type.Each example has an associated set
of descriptors.Learn Classification Function !
1/18/05 19
Data Mining
Each problem has data.Our job is to “mine” information from
this data.Information depends on the question
asked.In this case we must produce a
predictive yes/no model (a.k.a. a classification model) based on the data.
1/18/05 20
Mathematical Model
Have data
Construct predictive function f(x)ySolve mathematical model to find f
Want f to generalize well on future data
1 1( , ), , ( , )m mx y x y
2
2min ( )
m
f i i Ki
f x y f
1/18/05 21
Types of Learning Problems
Classification
Regression
Clustering
Ranking
1 or 1iy
iy R
unknowniy
1 2 , ,k jy y y y
1/18/05 22
Data Mining
Classification = yes/no modelsStart with examples of yes and no.Associate a set of descriptors with each
example. Descriptors must be appropriate for the question you are asking.
Construct a model to split the two setsUse the model to predict new examples.
1/18/05 23
Learning Model
What kind of learning task is it? What sort of f should we use?
Kernel function What loss function to use? What regularization function? How can we solve this learning model? How well will the model predict new points?
( ) ( , )i ii
f x K x x
1/18/05 24
Class information
See course web page http://
www.rpi.edu/~bennek/class/mmld/index.htm