13
Project II Data Mining a Mushroom Dataset Group 1 Raymond Borges Jarilyn Hernandez

Project 2 Data Mining Part 1

  • Upload
    rayborg

  • View
    972

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Project 2 Data Mining Part 1

Project II Data Mining a

Mushroom Dataset Group 1

Raymond Borges

Jarilyn Hernandez

Page 2: Project 2 Data Mining Part 1

The Mushroom Dataset

Data Set Characteristics:

Multivariate Number of Instances:

8124 Area: Life

Attribute Characteristics:

Categorical Number of Attributes:

22 Date Donated:

1987

This data set includes descriptions of hypothetical samples

corresponding to 23 species of gilled mushrooms in the

Agaricus and Lepiota Family.

Each species is identified as definitely edible, definitely

poisonous, or of unknown edibility and not recommended.

This latter class was combined with the poisonous one.

Page 3: Project 2 Data Mining Part 1

Mushroom Dataset

22 Independent attributes

1 Class Attribute (Can you eat it?)

Edible(4,208)51.8%

Poisonous(3,916)48.2%

Page 4: Project 2 Data Mining Part 1

Mushroom Dataset

22 Attributes Total

18 Intrinsically

on Mushroom

4 Others

1 Habitat

1 Population

1 Bruises

1 Odor

Page 5: Project 2 Data Mining Part 1

Odor attribute, 1R Learner

The Simplest Rule 98.52% Acc.

A = almond

C = creosote

F = foul

L = anise

M = musty

N = none

P = pungent

S = spicy

Y = fishy

a c f l m n p s y

Page 6: Project 2 Data Mining Part 1

J48 Tree 100%

Classification

P P P P P E P E

almond

creosote foul anise spicy fishy

E = Edible

P = Poisonous

E E E E E E P E

black

brown buff chocolate green orange purple yellow

E

broad

narrow

E P

P E E E E E

abundant clustered numerous scattered several solitary

musty none pungent

white

crowded distant close

Page 7: Project 2 Data Mining Part 1

Simplest rule-set (Benchmark)

These are Poisonous 1. Odor = not almond or anise or none

(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green

(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown

(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white

4. May also be population=clustered and cap-color=white (100% accuracy)

Page 8: Project 2 Data Mining Part 1

Habitat Insights

Woods Grasses Leaves Meadows Paths Urban Waste

Waste is safe but stay away from paths

Page 9: Project 2 Data Mining Part 1

Population Insights

Abundant Clustered Numerous Scattered Several Solitary

Mushrooms travel safer in groups

Page 10: Project 2 Data Mining Part 1

Information Knowledge

Population Data %Rates vs. Mushrooms

Abundant Clustered Numerous Scattered Several Solitary 0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

% Poisonous % Edible

Page 11: Project 2 Data Mining Part 1

Poisonous/Edible Ratio

vs. Mushroom Population Density

solitary

several

scattered

numerous clustered

abundant

-50.00%

0.00%

50.00%

100.00%

150.00%

200.00%

250.00%

300.00%

0 1 2 3 4 5 6 7

Po

iso

no

us/

Edib

le R

atio

Mushroom Density

Page 12: Project 2 Data Mining Part 1

Conclusions

If it stinks don’t eat it, 98.52% accuracy

If it doesn’t stink and it’s spore color is not

green then you have a 99.41% chance of

survival

Odor and spore color may be the best

attributes statistically but not in the field

Page 13: Project 2 Data Mining Part 1

Future Work Use more easily identified attributes to classify

mushrooms to produce a method of easier visual classification

Eliminate nonvisual attributes

Focus on visual-queue attributes, e.g.

habitat, population, cap and stalk

Compare the two methods