Upload
walter-carter
View
220
Download
0
Embed Size (px)
Citation preview
Slides atwww.cs.ucr.edu/~eamonn/public/cs170guest.ppt
Eamonn [email protected]
What do...What do...
• Bill Gates• Sir Richard Branson• Larry Page• Mohammed Saleh Bin Laden
……have in common? have in common?
They each wrote to me in the last 3 months about my paper on Bayesian classification of insects…
Let us
• Review Bayesian classification• Discuss the paper
Naïve Bayes ClassifierNaïve Bayes Classifier
Thomas Bayes1702 - 1761
Grasshoppers
KatydidsThe Classification ProblemThe Classification Problem(informal definition)
Given a collection of annotated data. In this case 5 instances Katydids of and five of Grasshoppers, decide what type of insect the unlabeled example is.
Katydid or Grasshopper?
Thorax Thorax LengthLength
Abdomen Abdomen LengthLength Antennae Antennae
LengthLength
MandibleMandibleSizeSize
SpiracleDiameter Leg Length
For any domain of interest, we can measure For any domain of interest, we can measure featuresfeatures
Color Color {Green, Brown, Gray, Other}{Green, Brown, Gray, Other} Has Wings?Has Wings?
An
tenn
a L
engt
hA
nte
nna
Len
gth
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Grasshoppers Katydids
Abdomen LengthAbdomen Length
Let’s get lots more data…Let’s get lots more data…Let’s get lots more data…Let’s get lots more data…
An
tenn
a L
engt
hA
nte
nna
Len
gth
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
KatydidsGrasshoppers
With a lot of data, we can build a histogram. Let us With a lot of data, we can build a histogram. Let us just build one for “Antenna Length” for now…just build one for “Antenna Length” for now…
We can leave the histograms as they are, or we can summarize them with two normal distributions.
Let us us two normal distributions for ease of visualization in the following slides…
p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d
3
Antennae length is 3
• We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it?
• We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid.• There is a formal way to discuss the most probable classification…
10
2
P(Grasshopper | 3 ) = 10 / (10 + 2) = 0.833
P(Katydid | 3 ) = 2 / (10 + 2) = 0.166
3
Antennae length is 3
p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d
9
3
P(Grasshopper | 7 ) = 3 / (3 + 9) = 0.250
P(Katydid | 7 ) = 9 / (3 + 9) = 0.750
7
Antennae length is 7
p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d
66
P(Grasshopper | 5 ) = 6 / (6 + 6) = 0.500
P(Katydid | 5 ) = 6 / (6 + 6) = 0.500
5
Antennae length is 5
p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d
Bayes ClassifiersBayes Classifiers
That was a visual intuition for a simple case of the Bayes classifier, also called:
• Idiot Bayes • Naïve Bayes• Simple Bayes
We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea.
Find out the probability of the previously unseen instance previously unseen instance belonging to each class, then simply pick the most probable class.
Bayes ClassifiersBayes Classifiers• Bayesian classifiers use Bayes theorem, which says
p(cj | d ) = p(d | cj ) p(cj) p(d)
• p(cj | d) = probability of instance d being in class cj, This is what we are trying to compute
• p(d | cj) = probability of generating instance d given class cj,
We can imagine that being in class cj, causes you to have feature d with some probability
• p(cj) = probability of occurrence of class cj,
This is just how frequent the class cj, is in our database
• p(d) = probability of instance d occurring
This can actually be ignored, since it is the same for all classes
Assume that we have two classes
c1 = malemale, and c2 = femalefemale.
We have a person whose sex we do not know, say “drew” or d.
Classifying drew as male or female is equivalent to asking is it more probable that drew is malemale or femalefemale, i.e which is greater p(malemale | drew) or p(femalefemale | drew)
p(malemale | drew) = p(drew | malemale ) p(malemale)
p(drew)
(Note: “Drew can be a male or female name”)
What is the probability of being called “drew” given that you are a male?
What is the probability of being a male?
What is the probability of being named “drew”? (actually irrelevant, since it is that same for all classes)
Drew Carey
Drew Barrymore
p(cj | d) = p(d | cj ) p(cj)
p(d)
Officer Drew
Name Sex
Drew MaleMale
Claudia FemaleFemale
Drew FemaleFemale
Drew FemaleFemale
Alberto MaleMale
Karin FemaleFemale
Nina FemaleFemale
Sergio MaleMale
This is Officer Drew (who arrested me in This is Officer Drew (who arrested me in 1997). Is Officer Drew a 1997). Is Officer Drew a MaleMale or or FemaleFemale??
Luckily, we have a small database with names and sex.
We can use it to apply Bayes rule…
p(malemale | drew) = 1/3 * 3/8 = 0.125
3/8 3/8
p(femalefemale | drew) = 2/5 * 5/8 = 0.250
3/8 3/8
Officer Drew
p(cj | d) = p(d | cj ) p(cj)
p(d)
Name Sex
Drew MaleMale
Claudia FemaleFemale
Drew FemaleFemale
Drew FemaleFemale
Alberto MaleMale
Karin FemaleFemale
Nina FemaleFemale
Sergio MaleMale
Officer Drew is more likely to be a FemaleFemale.
Officer Drew IS a female!Officer Drew IS a female!
Officer Drew
p(malemale | drew) = 1/3 * 3/8 = 0.125
3/8 3/8
p(femalefemale | drew) = 2/5 * 5/8 = 0.250
3/8 3/8
Name Over 170CM Eye Hair length Sex
Drew No Blue Short MaleMale
Claudia Yes Brown Long FemaleFemale
Drew No Blue Long FemaleFemale
Drew No Blue Long FemaleFemale
Alberto Yes Brown Short MaleMale
Karin No Blue Long FemaleFemale
Nina Yes Brown Short FemaleFemale
Sergio Yes Blue Long MaleMale
p(cj | d) = p(d | cj ) p(cj)
p(d)
So far we have only considered Bayes Classification when we have one attribute (the “antennae length”, or the “name”). But we may have many features.How do we use all the features?
• To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate
p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)
The probability of class cj generating instance d, equals….
The probability of class cj generating the observed value for feature 1, multiplied by..
The probability of class cj generating the observed value for feature 2, multiplied by..
• To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate
p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)
p(officer drew|cj) = p(over_170cm = yes|cj) * p(eye =blue|cj) * ….
Officer Drew is blue-eyed, over 170cm tall, and has long hair
p(officer drew| FemaleFemale) = 2/5 * 3/5 * ….
p(officer drew| MaleMale) = 2/3 * 2/3 * ….
p(d1|cj) p(d2|cj) p(dn|cj)
cjThe Naive Bayes classifiers is often represented as this type of graph…
Note the direction of the arrows, which state that each class causes certain features, with a certain probability
…
Naïve Bayes is fast and Naïve Bayes is fast and space efficientspace efficient
We can look up all the probabilities with a single scan of the database and store them in a (small) table…
Sex Over190cm
MaleMale Yes 0.15
No 0.85
FemaleFemale Yes 0.01
No 0.99
cj
…p(d1|cj) p(d2|cj) p(dn|cj)
Sex Long Hair
MaleMale Yes 0.05
No 0.95
FemaleFemale Yes 0.70
No 0.30
Sex
MaleMale
FemaleFemale
An obvious pointAn obvious point. I have used a . I have used a simple two class problem, and simple two class problem, and two possible values for each two possible values for each example, for my previous example, for my previous examples. However we can have examples. However we can have an arbitrary number of classes, or an arbitrary number of classes, or feature valuesfeature values
Animal Mass >10kg
CatCat Yes 0.15
No 0.85
DogDog Yes 0.91
No 0.09
PigPig Yes 0.99
No 0.01
cj
…p(d1|cj) p(d2|cj) p(dn|cj)
Animal
CatCat
DogDog
PigPig
Animal Color
CatCat Black 0.33
White 0.23
Brown 0.44
DogDog Black 0.97
White 0.03
Brown 0.90
PigPig Black 0.04
White 0.01
Brown 0.95
The Naïve Bayesian Classifier has a piecewise quadratic decision boundaryThe Naïve Bayesian Classifier has a piecewise quadratic decision boundary
GrasshoppersKatydids
Ants
Adapted from slide by Ricardo Gutierrez-Osuna
What is the Deadliest Animal?(not counting humans!)
What is the Deadliest Animal?
500+ per year
200+ per year10+ per
year
300+ per year
1,000,000+ per year
50,000+ per year
What is the Deadliest Animal?
50,000+ per year500+ per
year
200+ per year10+ per
year
300+ per year
1,000,000+ per year
One penny weights about the same as one thousand mosquitoes
How can something so small be so deadly?
What is Malaria?• Malaria is a disease that involves high
fevers, shaking chills, joint pain, flu-like symptoms. In some cases it can produce coma and death.
• There are more than 225 million cases of malaria each year, killing around 1-million people.
Where does Malaria come from?
Malaria has been known since ancient times.
Many believed it came from “bad air” (Italian: mala aria, “bad air”)
500 years ago, a handful of people believed that insects might be involved in human diseases.
Hortus Sanitatis
(The Garden of Health)
1497
It was Sir Ronald Ross, an British army surgeon working in India, who proved in 1897 that malaria is transmitted by mosquitoes.
Sir Ronald Ross received the 1902 Nobel Prize for Physiology or Medicine for his work
(This was somewhat controversial, as many others made similar discoveries around the same time )
Malaria Parasites
1st Vector
Initial Human
hostLiver
infection
Blood infection
2nd Vector
Next Human
host
Malaria Transmission Cycle
We get malaria from mosquitoes
We get malaria from
humans
• There are 3,528 kinds of mosquitoes• Only a handful of species take human blood• Only the females take human blood
• There are 100 trillion mosquitoes alive today
• Mosquitoes have been around for at least 100 million years
• We know this from fossil records/DNA studies
• Mosquitoes have spread malaria for at least 35 million years
• We know this from insects found in amber
The Mosquito
Where does malaria cause problems?
www.worldmapper.org
Given that we have known for over one hundred years how Malaria is spread, where is the magic pill or immunization?
For a variety of reasons, a cure or immunization continues to alluded mankind.
However there are some interventions that can help
In the 20th century, smallpox killed 400 million people worldwide, it is now eradicated.
Polio is almost eradicated.
Interventions to Mitigate Malaria
• The use of insecticidal treated mosquito nets
• Spraying of insecticides (including controversial chemicals such as DDT)
• Introduction of fish/turtles/crustaceans to eat mosquito larva
• The introduction of dragonflies which eat adult mosquitoes.
• Habitat reduction by draining ponds and pools
• Use of chemical films to reduce the surface tension of water (drowning the pupa).
• .. and hundreds more proven or tentative ideas
Interventions Cost Money!
• Even cheap solutions have hidden costs• Insecticidal treated mosquito nets are cheap to
make, but…
• To make mosquito nets work, you need educators, incentive programs, maintenance etc
“...aid agencies and non-governmental organizations are quietly grappling with a problem: Data suggest that nearly half of Africans who have access to the nets refuse to sleep under them” (LA Times
May-2-2010).
• We need to know where/when the problem is the greatest. Where are the insects?
Which insects are they?
When did they arrive?
• The classic solution?
Use sticky traps– Inaccurate– Costly– Long time lag
Planning interventions requires knowledge
My Research
• I believe that we can count and classify insects with sensors. The classification problem– Must be cheap– Must be low powered – Must be accurate
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 104-0.2
-0.1
0
0.1
0.2
One second of audio from our sensor. The Common Eastern Bumble Bee (Bombus impatiens) takes about one tenth of a second to pass the laser.
Background noise Bee begins to cross laser Bee has past though the laser
UCR Wingbeat UCR Wingbeat SensorSensor
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10
4-0.2
-0.1
0
0.1
0.2
0 100 200 300 400 500 600 700 800 900 10000
1
2
3
4x 10
-3Single-Sided Amplitude Spectrum of Y(t)
Frequency (Hz)
|Y(f
)|One second of audio from the laser sensor. Only Bombus impatiens (Common Eastern Bumble Bee) is in the insectary.
Background noiseBee begins to cross laser
Peak at 197Hz
Harmonics
60Hz interference
0 100 200 300 400 500 600 700 800 900 10000
1
2
3
4x 10
-3
Frequency (Hz)
|Y(f
)|
0 100 200 300 400 500 600 700 800 900 1000
Frequency (Hz)
0 100 200 300 400 500 600 700 800 900 1000
Frequency (Hz)
0 100 200 300 400 500 600 700
Wing Beat Frequency Hz
0 100 200 300 400 500 600 700
Wing Beat Frequency Hz
400 500 600 700
Anopheles stephensi: Female
mean =475, Std = 30Aedes aegyptii : Female mean =567, Std = 43
517
𝑃ሺ𝐴𝑛𝑜𝑝ℎ𝑒𝑙𝑒𝑠ȁ�𝑤𝑖𝑛𝑔𝑏𝑒𝑎𝑡 = 500ሻ= 1ξ2𝜋 30𝑒−(500−475)22×302
If I see an insect with a wingbeat frequency of 500, what is it?
400 500 600 700
517
12.2% of the area under the pink curve
8.02% of the area under the red curve
What is the error rate?
Can we get more features?
A mosquito flying across
the laser beam
400 800 12000
Wingbeat frequency at
354 HzAmplitude Spectrum
Can we discriminate the
sex of mosquitoes? (here Aedes
aegypti )
Predicted Class
Can we discriminate mosquitoes from non-
mosquitoes?
Predicted Class
Can we discriminate between mosquito species?
Predicted Class
female male Ae. aegypti
Tipula An. gambiae
Ae. aegypti
Actual female
9,994 6 Actual Ae. aegypti
9,999 0 Actual An. gambia
e
9,979 14
Class male 3 9,997 Class Tipula 1 800 Class Ae. aegypti
21 9,986
17Bee count
21Mosquito
count
2134Housefly
count
17Bee count
21Mosquito
count
2134Housefly
count
Spay the dark red areas at dusk
Using Pesticides Using Pesticides IntelligentlyIntelligently
If we are not sure where the insects are…
If we know where the insects are…
High cost (fuel/pesticide)Damage to environment
Low costLittle damage to environment
You won 10000KCongratulations! You won the
Nigerian lottery!Please send me your name and
more than four hours!
Upcoming dentist appDear Eamonn
Don’t forget your dentist appoint Many thanks, joe
here's we hue opposings and ents thers that fled of the undispriz'd coward to
sufferLose Weight
Now!here's we hue opposings and ents thers that fled of the undispriz'd coward to
suffer
Make money fast!here's we hue opposings and ents thers that fled of the undispriz'd coward to
sufferUpcoming dentist app
Dear EamonnDon’t forget your dentist appoint next
Wednesday the 17
Upcoming dentist appDear Eamonn
Don’t forget your dentist appoint next Wednesday the 17
You won 10000KCongratulations! You won the
Nigerian lottery!Please send me your name and
address
Spam Filter
Spam Folder Email Folder
The holy grail of computational entomology
With a single click of a button, we can take action on
our emails/spam.
We can delete, forward, save them etc
17Bee count
21Mosquito
count
2134Housefly
count
Can we do this with insects?
Can we “delete” mosquitoes(Kill them)
Can we “forward” bees(Make them go North and pollinate almonds etc)
The holy grail of computational entomology
Controlling Insects with Computers
Insect cyborgs are cool, but not the answer.
We need to be able to control insects at a cost of say 1,000,000 insects per penny.
Controlling Insects with Computers
We already control some insects with technology/computers
We are exploiting the fact that moths evolved without artificial lights.
What else can we exploit?
What else can we exploit?
The more we learn about nature, the more we can improve human the human condition
10
1 2 3 4 5 6 7 8 9 10
123456789
100
10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
10
1 2 3 4 5 6 7 8 9 10
123456789
Which of the “Pigeon Problems” can be Which of the “Pigeon Problems” can be solved by a decision tree?solved by a decision tree?
• Advantages:– Fast to train (single scan). Fast to classify – Not sensitive to irrelevant features– Handles real and discrete data– Handles streaming data well
• Disadvantages:– Assumes independence of features
Advantages/Disadvantages of Naïve BayesAdvantages/Disadvantages of Naïve Bayes
Summary of Naïve BayesSummary of Naïve Bayes
The Naïve Bayes classifier is a fantastic tool for solving real world problems.
The are lots of jobs, research opportunities and entrepreneurial opportunities for people that can apply it in real world situations.