19
Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hu t.fi

Knowledge discovery process Chapter 1 Juha Vesanto [email protected]

Embed Size (px)

Citation preview

Page 1: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Knowledge discovery process

Chapter 1

Juha Vesanto

[email protected]

Page 2: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Starting point!

Data exploration starts with data.

?

Page 3: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

The real starting point!

Data exploration starts with data.

Data exploration starts with identifying a need.

?

? !

Page 4: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Customer

• Problem owners

• Problem holders

• Useful

• Profitable

Participation Motivation

Page 5: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

The process (CRISP-DM)

Page 6: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

The process (Pyle)

Exploring the problemExploring the solutionImplementation specification

PreparationSurveyData modeling

20% work80% importance

80% work20% importance

Page 7: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

The problem

• Identify the right problem

• Define solvable problem(s)

• Transfer the problem understanding to the miner

Page 8: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Example

“I really need a model of the Monday and Friday failure rates so we can stop them!”

• What is a failure?

• How it is detected/measured?

• Is it a quality problem or just fluctuation of error rates?

• Which problem components need to be looked at?

• ...

Page 9: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

The solution

What does the solution look like?

- a program used by an expert - a data set to be referred to- a model to be used for prediction- a presentation / report- ...

How (and by whom) is the solution implemented?

Page 10: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Data mining

• Prepare: both the data and the miner

• Survey: understand the data

is the data adequate?

• Model: refining the details

depends on nature of data and the solution goal

Page 11: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Why preparation?

GIGO: fix the data

Get a data set which isof maximum use

preserves the information

enhanced for problem & model

Page 12: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

PIE

Prepared Information Environment1. prepare the training/testing data

2. transform prepared values to original

3. apply the same preparation to new data

PIE-in

PIE-out

data

newdata

report

model

Page 13: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Why survey?

Get a broad idea of the data: • what is covered

• what is not covered, or is covered poorly

Dangerous areas: • bias in data

• sparse data (in a dynamic area)

Is the data adequate?

Page 14: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Modeling hype

Universal approximator

can be applied to any data

Data-driven

no theoretical knowledge required

Page 15: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Modeling definition

Model: “a representation … to show the construction or serve as a copy of something”

= makes information understandable or usable =

Page 16: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Modeling in data mining

Modeling is iterative:

1. Define problem2. Select tool

3. Collect data 4. Make model

5. Apply6. Evaluate

Traditional statistical methods: first model, then data

Page 17: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Model types

• Active or passive

• Explanatory or predictive

• Static or continuously learning

Page 18: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Ten golden rules

1. Select clear problem with tangible benefit

2. Specify required solution

3. Define how solution is implemented

4. Understand the domain

6. Stipulate assumptions

5. Let the problem drive the modeling

7. Refine the model iteratively

8. Make the model as simple as possible (but no simpler)

9. Find areas of instability

10. Find areas of uncertainty

Page 19: Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

Critique

• Model evaluation is missing

• Iteration of planning stage

• Domain expert as data miner