Slides at eamonn/public/cs170guest.ppt Eamonn Keogh eamonn@cs.ucr.edu

Preview:

Citation preview

Slides atwww.cs.ucr.edu/~eamonn/public/cs170guest.ppt

Eamonn Keogheamonn@cs.ucr.edu

What do...What do...

• Bill Gates• Sir Richard Branson• Larry Page• Mohammed Saleh Bin Laden

……have in common? have in common?

They each wrote to me in the last 3 months about my paper on Bayesian classification of insects…

Let us

• Review Bayesian classification• Discuss the paper

Naïve Bayes ClassifierNaïve Bayes Classifier

Thomas Bayes1702 - 1761

Grasshoppers

KatydidsThe Classification ProblemThe Classification Problem(informal definition)

Given a collection of annotated data. In this case 5 instances Katydids of and five of Grasshoppers, decide what type of insect the unlabeled example is.

Katydid or Grasshopper?

Thorax Thorax LengthLength

Abdomen Abdomen LengthLength Antennae Antennae

LengthLength

MandibleMandibleSizeSize

SpiracleDiameter Leg Length

For any domain of interest, we can measure For any domain of interest, we can measure featuresfeatures

Color Color {Green, Brown, Gray, Other}{Green, Brown, Gray, Other} Has Wings?Has Wings?

An

tenn

a L

engt

hA

nte

nna

Len

gth

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Grasshoppers Katydids

Abdomen LengthAbdomen Length

Let’s get lots more data…Let’s get lots more data…Let’s get lots more data…Let’s get lots more data…

An

tenn

a L

engt

hA

nte

nna

Len

gth

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

KatydidsGrasshoppers

With a lot of data, we can build a histogram. Let us With a lot of data, we can build a histogram. Let us just build one for “Antenna Length” for now…just build one for “Antenna Length” for now…

We can leave the histograms as they are, or we can summarize them with two normal distributions.

Let us us two normal distributions for ease of visualization in the following slides…

p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d

3

Antennae length is 3

• We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it?

• We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid.• There is a formal way to discuss the most probable classification…

10

2

P(Grasshopper | 3 ) = 10 / (10 + 2) = 0.833

P(Katydid | 3 ) = 2 / (10 + 2) = 0.166

3

Antennae length is 3

p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d

9

3

P(Grasshopper | 7 ) = 3 / (3 + 9) = 0.250

P(Katydid | 7 ) = 9 / (3 + 9) = 0.750

7

Antennae length is 7

p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d

66

P(Grasshopper | 5 ) = 6 / (6 + 6) = 0.500

P(Katydid | 5 ) = 6 / (6 + 6) = 0.500

5

Antennae length is 5

p(cj | d) = probability of class cj, given that we have observed dp(cj | d) = probability of class cj, given that we have observed d

Bayes ClassifiersBayes Classifiers

That was a visual intuition for a simple case of the Bayes classifier, also called:

• Idiot Bayes • Naïve Bayes• Simple Bayes

We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea.

Find out the probability of the previously unseen instance previously unseen instance belonging to each class, then simply pick the most probable class.

Bayes ClassifiersBayes Classifiers• Bayesian classifiers use Bayes theorem, which says

p(cj | d ) = p(d | cj ) p(cj) p(d)

• p(cj | d) = probability of instance d being in class cj, This is what we are trying to compute

• p(d | cj) = probability of generating instance d given class cj,

We can imagine that being in class cj, causes you to have feature d with some probability

• p(cj) = probability of occurrence of class cj,

This is just how frequent the class cj, is in our database

• p(d) = probability of instance d occurring

This can actually be ignored, since it is the same for all classes

Assume that we have two classes

c1 = malemale, and c2 = femalefemale.

We have a person whose sex we do not know, say “drew” or d.

Classifying drew as male or female is equivalent to asking is it more probable that drew is malemale or femalefemale, i.e which is greater p(malemale | drew) or p(femalefemale | drew)

p(malemale | drew) = p(drew | malemale ) p(malemale)

p(drew)

(Note: “Drew can be a male or female name”)

What is the probability of being called “drew” given that you are a male?

What is the probability of being a male?

What is the probability of being named “drew”? (actually irrelevant, since it is that same for all classes)

Drew Carey

Drew Barrymore

p(cj | d) = p(d | cj ) p(cj)

p(d)

Officer Drew

Name Sex

Drew MaleMale

Claudia FemaleFemale

Drew FemaleFemale

Drew FemaleFemale

Alberto MaleMale

Karin FemaleFemale

Nina FemaleFemale

Sergio MaleMale

This is Officer Drew (who arrested me in This is Officer Drew (who arrested me in 1997). Is Officer Drew a 1997). Is Officer Drew a MaleMale or or FemaleFemale??

Luckily, we have a small database with names and sex.

We can use it to apply Bayes rule…

p(malemale | drew) = 1/3 * 3/8 = 0.125

3/8 3/8

p(femalefemale | drew) = 2/5 * 5/8 = 0.250

3/8 3/8

Officer Drew

p(cj | d) = p(d | cj ) p(cj)

p(d)

Name Sex

Drew MaleMale

Claudia FemaleFemale

Drew FemaleFemale

Drew FemaleFemale

Alberto MaleMale

Karin FemaleFemale

Nina FemaleFemale

Sergio MaleMale

Officer Drew is more likely to be a FemaleFemale.

Officer Drew IS a female!Officer Drew IS a female!

Officer Drew

p(malemale | drew) = 1/3 * 3/8 = 0.125

3/8 3/8

p(femalefemale | drew) = 2/5 * 5/8 = 0.250

3/8 3/8

Name Over 170CM Eye Hair length Sex

Drew No Blue Short MaleMale

Claudia Yes Brown Long FemaleFemale

Drew No Blue Long FemaleFemale

Drew No Blue Long FemaleFemale

Alberto Yes Brown Short MaleMale

Karin No Blue Long FemaleFemale

Nina Yes Brown Short FemaleFemale

Sergio Yes Blue Long MaleMale

p(cj | d) = p(d | cj ) p(cj)

p(d)

So far we have only considered Bayes Classification when we have one attribute (the “antennae length”, or the “name”). But we may have many features.How do we use all the features?

• To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

The probability of class cj generating instance d, equals….

The probability of class cj generating the observed value for feature 1, multiplied by..

The probability of class cj generating the observed value for feature 2, multiplied by..

• To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate

p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)

p(officer drew|cj) = p(over_170cm = yes|cj) * p(eye =blue|cj) * ….

Officer Drew is blue-eyed, over 170cm tall, and has long hair

p(officer drew| FemaleFemale) = 2/5 * 3/5 * ….

p(officer drew| MaleMale) = 2/3 * 2/3 * ….

p(d1|cj) p(d2|cj) p(dn|cj)

cjThe Naive Bayes classifiers is often represented as this type of graph…

Note the direction of the arrows, which state that each class causes certain features, with a certain probability

Naïve Bayes is fast and Naïve Bayes is fast and space efficientspace efficient

We can look up all the probabilities with a single scan of the database and store them in a (small) table…

Sex Over190cm

MaleMale Yes 0.15

No 0.85

FemaleFemale Yes 0.01

No 0.99

cj

…p(d1|cj) p(d2|cj) p(dn|cj)

Sex Long Hair

MaleMale Yes 0.05

No 0.95

FemaleFemale Yes 0.70

No 0.30

Sex

MaleMale

FemaleFemale

An obvious pointAn obvious point. I have used a . I have used a simple two class problem, and simple two class problem, and two possible values for each two possible values for each example, for my previous example, for my previous examples. However we can have examples. However we can have an arbitrary number of classes, or an arbitrary number of classes, or feature valuesfeature values

Animal Mass >10kg

CatCat Yes 0.15

No 0.85

DogDog Yes 0.91

No 0.09

PigPig Yes 0.99

No 0.01

cj

…p(d1|cj) p(d2|cj) p(dn|cj)

Animal

CatCat

DogDog

PigPig

Animal Color

CatCat Black 0.33

White 0.23

Brown 0.44

DogDog Black 0.97

White 0.03

Brown 0.90

PigPig Black 0.04

White 0.01

Brown 0.95

The Naïve Bayesian Classifier has a piecewise quadratic decision boundaryThe Naïve Bayesian Classifier has a piecewise quadratic decision boundary

GrasshoppersKatydids

Ants

Adapted from slide by Ricardo Gutierrez-Osuna

What is the Deadliest Animal?(not counting humans!)

What is the Deadliest Animal?

500+ per year

200+ per year10+ per

year

300+ per year

1,000,000+ per year

50,000+ per year

What is the Deadliest Animal?

50,000+ per year500+ per

year

200+ per year10+ per

year

300+ per year

1,000,000+ per year

One penny weights about the same as one thousand mosquitoes

How can something so small be so deadly?

What is Malaria?• Malaria is a disease that involves high

fevers, shaking chills, joint pain, flu-like symptoms. In some cases it can produce coma and death.

• There are more than 225 million cases of malaria each year, killing around 1-million people.

Where does Malaria come from?

Malaria has been known since ancient times.

Many believed it came from “bad air” (Italian: mala aria, “bad air”)

500 years ago, a handful of people believed that insects might be involved in human diseases. 

Hortus Sanitatis

(The Garden of Health)

1497

It was Sir Ronald Ross, an British army surgeon working in  India, who proved in 1897 that malaria is transmitted by mosquitoes.

Sir Ronald Ross received the 1902 Nobel Prize for Physiology or Medicine for his work

(This was somewhat controversial, as many others made similar discoveries around the same time )

Malaria Parasites

1st Vector

Initial Human

hostLiver

infection

Blood infection

2nd Vector

Next Human

host

Malaria Transmission Cycle

We get malaria from mosquitoes

We get malaria from

humans

• There are 3,528 kinds of mosquitoes• Only a handful of species take human blood• Only the females take human blood

• There are 100 trillion mosquitoes alive today

• Mosquitoes have been around for at least 100 million years

• We know this from fossil records/DNA studies

• Mosquitoes have spread malaria for at least 35 million years

• We know this from insects found in amber

The Mosquito

Where does malaria cause problems?

www.worldmapper.org

Given that we have known for over one hundred years how Malaria is spread, where is the magic pill or immunization?

For a variety of reasons, a cure or immunization continues to alluded mankind.

However there are some interventions that can help

In the 20th century, smallpox killed 400 million people worldwide, it is now eradicated.

Polio is almost eradicated.

Interventions to Mitigate Malaria

• The use of insecticidal treated mosquito nets

• Spraying of insecticides (including controversial chemicals such as DDT)

• Introduction of fish/turtles/crustaceans to eat mosquito larva

• The introduction of dragonflies which eat adult mosquitoes.

• Habitat reduction by draining ponds and pools

• Use of chemical films to reduce the surface tension of water (drowning the pupa).

• .. and hundreds more proven or tentative ideas

Interventions Cost Money!

• Even cheap solutions have hidden costs• Insecticidal treated mosquito nets are cheap to

make, but…

• To make mosquito nets work, you need educators, incentive programs, maintenance etc

“...aid agencies and non-governmental organizations are quietly grappling with a problem: Data suggest that nearly half of Africans who have access to the nets refuse to sleep under them” (LA Times

May-2-2010).

• We need to know where/when the problem is the greatest. Where are the insects?

Which insects are they?

When did they arrive?

• The classic solution?

Use sticky traps– Inaccurate– Costly– Long time lag

Planning interventions requires knowledge

My Research

• I believe that we can count and classify insects with sensors. The classification problem– Must be cheap– Must be low powered – Must be accurate

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 104-0.2

-0.1

0

0.1

0.2

One second of audio from our sensor. The Common Eastern Bumble Bee (Bombus impatiens) takes about one tenth of a second to pass the laser.

Background noise Bee begins to cross laser Bee has past though the laser

UCR Wingbeat UCR Wingbeat SensorSensor

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 10

4-0.2

-0.1

0

0.1

0.2

0 100 200 300 400 500 600 700 800 900 10000

1

2

3

4x 10

-3Single-Sided Amplitude Spectrum of Y(t)

Frequency (Hz)

|Y(f

)|One second of audio from the laser sensor. Only Bombus impatiens (Common Eastern Bumble Bee) is in the insectary.

Background noiseBee begins to cross laser

Peak at 197Hz

Harmonics

60Hz interference

0 100 200 300 400 500 600 700 800 900 10000

1

2

3

4x 10

-3

Frequency (Hz)

|Y(f

)|

0 100 200 300 400 500 600 700 800 900 1000

Frequency (Hz)

0 100 200 300 400 500 600 700 800 900 1000

Frequency (Hz)

0 100 200 300 400 500 600 700

Wing Beat Frequency Hz

0 100 200 300 400 500 600 700

Wing Beat Frequency Hz

400 500 600 700

Anopheles stephensi: Female

mean =475, Std = 30Aedes aegyptii : Female mean =567, Std = 43

517

𝑃ሺ𝐴𝑛𝑜𝑝ℎ𝑒𝑙𝑒𝑠ȁ�𝑤𝑖𝑛𝑔𝑏𝑒𝑎𝑡 = 500ሻ= 1ξ2𝜋 30𝑒−(500−475)22×302

If I see an insect with a wingbeat frequency of 500, what is it?

400 500 600 700

517

12.2% of the area under the pink curve

8.02% of the area under the red curve

What is the error rate?

Can we get more features?

A mosquito flying across

the laser beam

400 800 12000

Wingbeat frequency at

354 HzAmplitude Spectrum

Can we discriminate the

sex of mosquitoes? (here Aedes

aegypti )

Predicted Class

  Can we discriminate mosquitoes from non-

mosquitoes?

Predicted Class

  Can we discriminate between mosquito species?

Predicted Class

female male   Ae. aegypti

Tipula   An. gambiae

Ae. aegypti

Actual female

9,994 6   Actual Ae. aegypti

9,999 0   Actual An. gambia

e

9,979 14

Class male 3 9,997   Class Tipula 1 800   Class Ae. aegypti

21 9,986

17Bee count

21Mosquito

count

2134Housefly

count

17Bee count

21Mosquito

count

2134Housefly

count

Spay the dark red areas at dusk

Using Pesticides Using Pesticides IntelligentlyIntelligently

If we are not sure where the insects are…

If we know where the insects are…

High cost (fuel/pesticide)Damage to environment

Low costLittle damage to environment

You won 10000KCongratulations! You won the

Nigerian lottery!Please send me your name and

more than four hours!

Upcoming dentist appDear Eamonn

Don’t forget your dentist appoint Many thanks, joe

here's we hue opposings and ents thers that fled of the undispriz'd coward to

sufferLose Weight

Now!here's we hue opposings and ents thers that fled of the undispriz'd coward to

suffer

Make money fast!here's we hue opposings and ents thers that fled of the undispriz'd coward to

sufferUpcoming dentist app

Dear EamonnDon’t forget your dentist appoint next

Wednesday the 17

Upcoming dentist appDear Eamonn

Don’t forget your dentist appoint next Wednesday the 17

You won 10000KCongratulations! You won the

Nigerian lottery!Please send me your name and

address

Spam Filter

Spam Folder Email Folder

The holy grail of computational entomology

With a single click of a button, we can take action on

our emails/spam.

We can delete, forward, save them etc

17Bee count

21Mosquito

count

2134Housefly

count

Can we do this with insects?

Can we “delete” mosquitoes(Kill them)

Can we “forward” bees(Make them go North and pollinate almonds etc)

The holy grail of computational entomology

Controlling Insects with Computers

Insect cyborgs are cool, but not the answer.

We need to be able to control insects at a cost of say 1,000,000 insects per penny.

Controlling Insects with Computers

We already control some insects with technology/computers

We are exploiting the fact that moths evolved without artificial lights.

What else can we exploit?

What else can we exploit?

The more we learn about nature, the more we can improve human the human condition

10

1 2 3 4 5 6 7 8 9 10

123456789

100

10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

10

1 2 3 4 5 6 7 8 9 10

123456789

Which of the “Pigeon Problems” can be Which of the “Pigeon Problems” can be solved by a decision tree?solved by a decision tree?

• Advantages:– Fast to train (single scan). Fast to classify – Not sensitive to irrelevant features– Handles real and discrete data– Handles streaming data well

• Disadvantages:– Assumes independence of features

Advantages/Disadvantages of Naïve BayesAdvantages/Disadvantages of Naïve Bayes

Summary of Naïve BayesSummary of Naïve Bayes

The Naïve Bayes classifier is a fantastic tool for solving real world problems.

The are lots of jobs, research opportunities and entrepreneurial opportunities for people that can apply it in real world situations.

Recommended